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PREFACE 


TuIs BOOK is intended for all who require a mathematically sound, 
but elementary introduction to the theory of probability. 

Probability concepts are now of great importance in a wide variety 
of fields. The theory of probability, as the foundation upon which 
the methods of statistics are based, should command the attention of 
those who want to understand as well as apply statistical techniques. 
Probabilistic theories, making explicit reference to the nature and 
effects of chance phenomena, are the rule rather than the exception 
in the physical and biological sciences. Less well known is the fact 
that probability concepts are finding increased use in the social 
sciences and business: psychologists develop stochastic models for 
learning; economists use the techniques of game theory to discuss 
competition and markets; expected values, variances, and other mat- 
ters related to random variables turn out to be important in the 
problem of finding combinations of securities that best meet the 
needs of the investor; business managers, because their decisions 
must be made in the face of uncertainty, invoke the theory of prob- 
ability as an aid in planning inventory, establishing quality control, 
designing market surveys, etc. We need not go on—it is clear that 
probability concepts and methods are now widely used and will see 
even more extensive use in the future. 

One noteworthy indication of the importance of our subject is the 
recent decision of the Commission on Mathematies of the College 
Entrance Examination Board to recommend that a course in prob- 
ability and statistical inference be offered in the twelfth grade of the 
secondary school. Thus, secondary school teachers of mathematics, 
at some point in their college or in-service training, or in summer 
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institutes (such as those sponsored by the National Science Founda- 
tion), should achieve some mastery of the elements of probability 
theory. Parts of this book were used in courses offered in NSF Insti- 
tutes held at Oberlin College in 1958 and 1959, and the final manu- 
script has benefited from the helpful comments of the many teachers 
who studied preliminary versions. 

Although there are a number of excellent textbooks on probability, 
they are all written for readers who have the mathematical sophisti- 
cation that comes with a working knowledge of the differential and 
integral calculus. It seemed to me worthwhile to bring the theory of 
probability to the attention of those who do not have the calculus 
prerequisite. It was with this aim in mind that I limited myself to 
those topics that are accessible to readers with only a good back- 
ground in high school algebra and a little ability in the reading and 
manipulation of mathematical symbols. The consequent limitation 
to finite sample spaces, although severe, facilitates a careful logical 
treatment of the essentials needed by all who use probability con- 
cepts. Furthermore, I have found that an understanding of the basic 
definitions, theorems, and methods in the finite case makes it much 
easier for students with the necessary preparation to master the 
corresponding ideas in the infinite case. I am therefore hopeful that 
this volume, although written as a basic textbook for courses in 
probability and statistics for students without ealeulus, will also 
prove useful in courses for those who have previous training in 
caleulus. 

One further possible use of this book is worthy of mention here. 
There are many college students who, for one reason or another, can 
take at most one year of mathematics. These students are often 
offered a smorgasbord survey course in which they sample one topic 
after another and learn very little about lots of things. Many teach- 
ers, however, prefer to offer a course centering on a few main topics, 
going into each systematically and deeply enough to give the student 
a reasonable depth of knowledge in the chosen subjects. Although 
many topics vie for inclusion in such a program, I believe a strong 
case can be made for a course that concentrates on sets and prob- 
ability in the finite case at first, proceeds to an introduction to the 
caleulus, and then applies this calculus to the elements of probability 
in the infinite case. (In my own course, I also include applications 
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of the differential caleulus to simple problems in economies. Such 
a course, if properly executed, can give the student a keen sense 
of the nature and achievements of mathematical thinking, while 
laying a firm foundation for further study in economics, statistics, 
operations research, or allied fields. Such a program would there- 
fore be especially valuable for social science and business students, 
assuming they can devote only a year to mathematics at the college 
level. I have used this volume in preliminary form in roughly the 
first third of such a year course at Oberlin College, with students 
who present less than three years of high school mathematics for 
entrance. Teachers who share my point of view may also find this 
book useful in their own introductory mathematics courses. 

Since the theory of probability is best formulated using the lan- 
guage and notation of sets, we devote the first chapter to the 
elementary mathematics of sets. Proofs of laws in the algebra of 
sets are simplified by the use of so-called membership tables, a 
device analagous to truth tables in logic. Here we also introduce 
Cartesian product sets, which are needed at many points throughout 
the book. 

Chapter 2 develops the basic calculus of probability for experi- 
ments with only a finite number of possible outcomes (finite sample 
spaces). A probability measure is first introduced over the events 
of a sample space and then conditional probability, independent 
events, and independent trials are carefully defined. Illustrative 
and problem material is here limited to the simplest experimental 
situations, and more sophisticated combinatorial techniques are 
first treated in Chapter 3. The usual order of topics has been re- 
versed because beginning students seem always to have difficulty 
with the use of permutation and combination formulas, and this 
difficulty often impairs the learning of the basic probability ideas 
when both are presented simultaneously. We present the basic ideas 
first and then, in Chapter 3, offer a set of exercises in which the 
previously mastered probability theory is applied to a wide variety 
of situations requiring the use of sophisticated counting techniques. 
It has been our experience that this procedure makes it considerably 
easier for the student to learn this basie material. 

Chapter 4 is an introduction to the analytic theory of probability 
in the finite case. Random variables are defined as functions on 
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sample spaces, and probability distributions, means, standard devia- 
tions, joint probability functions, covariance, and correlation are 
discussed. Independence of random variables is defined and, with 
these ideas extended to the multivariate case, applications to random 
sampling theory can be included. The sampling distribution of the 
sample mean is discussed and formulas for its mean and variance 
are derived for both sampling with and without replacement. 

The most important probability function defined on a finite 
sample space, the binomial distribution, forms the subject matter 
of the final chapter. The basic properties of a Bernoulli process and 
a binomially distributed random variable are derived, and the use 
of tables of cumulative binomial probabilities is discussed. Applica- 
tions to the testing of statistical hypotheses (significance tests), as 
well as to a more complex problem of decision-making under uncer- 
tainty serve to illustrate how probability methods are applied in 
statistical investigations. 

For some classes, teachers may find it necessary to offer supple- 
mentary lessons on the method of mathematical induction and the 
use of summation signs, as these topies arise in the text. I have 
also found that it is wise to constantly remind the beginning student 
of the substitution principle, for example, that from Var(X) 2 0 
for all X it follows that Var(2X — 3Y) 2 0. Much of the difficulty 
beginners have with mathematics stems from a lack of understanding 
of this principle, and it is well worth emphasis. 

In all other respects, I have made every effort to have this book 
self-contained, clear, and readable. Throughout, stress is laid on the 
explanation of fundamental concepts and patterns of mathematical 
reasoning, as well as on techniques of problem-solving. Problems at 
the end of each section are designed to supplement the many 
worked-out illustrative examples in the text and to enable the 
reader to check his understanding of new definitions, theorems, and 
methods. From time to time, problems are included to challenge the 
better student—the sample variance, maximum likelihood estima- 
tion, the hypergeometrie distribution, regression functions, and 
OC-curves for sampling inspection are introduced in problems that 
are written so as to guide the student toward an understanding of 
these important topics. Answers (often complete solutions) to half 
of the 360 problems are collected in a 21-page section at the end of 
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the book. To facilitate computations, tables of ordinary logarithms, 

logarithms of factorials, and cumulative binomial probabilities are 

included in the text. A list of books suitable for supplementary 

reading appears at the end of each chapter. I trust that these 

features will serve to make the hard job of learning a little less hard. 
Comments from readers are always welcome. 


SAMUEL GOLDBERG 
Cambridge, Mass. 
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Chapter 1 


SETS 


1. Examples of sets; basic notation 


The concept of a set, whose fundamental role in mathematics was 
first pointed out in the work of the mathematician Georg Cantor 
(1845-1918), has significantly affected the structure and language of 
modern mathematics. In particular, the mathematical theory of 
probability is now most effectively formulated by using the termi- 
nology and notation of sets. For this reason, we devote Chapter 1 to 
the elementary mathematics of sets. Additional topics in set theory 
are included throughout the text, as the need for this material be- 
comes apparent. 

The notion of a set is sufficiently deep in the foundation of mathe- 
matics to defy being defined (at the level of this book) in terms of 
still more basic concepts. Hence, we can only aim here, by taking 
advantage of the reader’s knowledge of the English language and his 
experience with the real and conceptual world, to make clear the 
denotation of the word “set.” 

A set is merely an aggregate or collection of objects of any sort: 
people, numbers, books, outcomes of experiments, geometrical fig- 
ures, ete. Thus, we can speak of the set of all integers, or the set of 
all oceans, or the set of all possible sums when two dice are rolled 
and the number of dots on the uppermost faces are added, or the set 
consisting of the cities of Cambridge and Oberlin and all their resi- 

1 


2 SETS / Gnap.1 
dents, or the set of all straight lines (in a given plane) which pass 
through a given point. 

The collection of objects must be well-defined, by which we mean 
that, for any object whatsoever, the question “Does this object be- 
long to the collection?" has an unequivocal “yes” or "no" answer. 
It is not necessary that we personally have the knowledge required to 
decide which answer is correct. We must know only that, of the 
answers "yes" and “no,” exactly one is correct. 

Let us also agree that no object in a set is counted twice; i.e., the 
objects are distinct. Tt follows that, when listing the objects in a set, 
we do not repeat an object after it is once recorded. For example, 
according to this convention, the set of letters in the word “banana” 
is a set containing not six letters, but rather the three distinct letters 
b, a, and n. 

The following definition summarizes our discussion to this point 
and introduces some additional terminology and notation. 

Definition 1.1. A set is a well-defined collection of distinct ob- 
jects. The individual objects that collectively make up a given set 
are called its elements, and each element belongs to or is a member of 
or is contained in the set. If a is an object and A a set, then we write 
a e A as an abbreviation for “a is an element of A” and a ¢A for 
“a is not an element of A.” If a set has a finite number of elements, 
then it is called a finite set; otherwise it is called an infinite set. 

We are relying on the reader's knowledge of the positive integers 
1, 2, 3, +++, the so-called counting or natural numbers. This is an 
infinite set of numbers. To say that a set is finite means that one 
can enumerate the elements of the set in some order, then count these 
elements one by one until a last element is reached. Let us note that 
it is possible for a set, like the set of grains of sand on the Concy Island 
beach, to have a fantastically large number of elements and never- 
theless be a finite set. 

À set is ordinarily specified either by (i) listing all its elements and 
enclosing them in braces (the so-called roster method of defining the 
Set), or by (ii) enclosing in braces a defining properly and agreeing 
that those objects that have the property, and only those objects, 
are members of the set. We discuss these important ideas further 
and introduce additional notation in the following examples. 


i Example 1.1. The set whose elements are the integers 0, 5, and 12 
is a finite set with three elements. If we denote this set by A, then it 
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is conveniently written using the roster method: A = {0, 5, 12}. 
The statements “5 e A" and “6 ¢ A” are both true. 


Example 1.2. If we write V = (a, e, 2, o, u}, then we have defined 
the set V of vowels in the English alphabet by listing its five elements. 
To specify V by a defining property we write 

V = {z]æisa vowel in the English alphabet}, 
which is read “V is the set of those elements x such that x is a vowel 
in the English alphabet." Braces are always used when specifying a 
set; the vertical bar | is read “such that" or “for which." The symbol 
x is of course merely a place-holder; any other symbol will do just as 
well. For example, we can also write 
V = {*| *isa vowel in the English alphabet}. 
A slight modifieation of this notation is often used. Let us first 
introduce the set A to stand for the set of all letters of the English 
alphabet. Then we write 


V = (*eA | *isa vowel}, 
which is read “V is the set of those elements * of A such that * is a 
vowel." 


Example 1.3. The set B = {-2,2} is the same set as 
{x eR |x? = 4}, where R is the set of all real numbers. The set. 
{x ¢R|2? = —1} has no elements, since the square of any real 
number is nonnegative. But if C is the set of all complex numbers, 


then (reC|z? = —1} contains the elements 7 = V —1 and —i. 


Example 1.4. A prime number is a positive integer greater than 1 
but divisible only by 1 and itself. A proof of the fact that the set 
{p |p is a prime number} is an infinite set was given by Euclid 
(2330-275 n.c.) in the ninth book of his Elements. Strictly speaking, 
the roster method is unavailable for infinite sets, since it is not, pos- 
sible to list all the members and have explicitly before one a totality 
of elements making up an infinite set. The notation 


(2, 3, 5, 7, 11, 13, 17, 19, ---}, 
in which some of the elements of the set are listed followed by three 


dots which take the place of et cetera and stand for obviously under- 
stood omissions of one or more elements, is an often used but logi- 
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cally unsatisfactory way out of this difficulty. (See Problem 1.8.) 
To specify an infinite Set correctly, one must (as we did when we 


introduced the set of prime numbers) cite à defining property of the 
set. 


Example 1.5. If a rectangular coordinate System (with z-axis and 
y-axis) is introduced in a plane 
z-coordinate and a y -coordinate 


1(a), by an ordered pair of real numb 


* à, —-3) 


(a) 


(b) 


Figure 1 
is interested in Sets of points Whose coordinates meet certain re- 
(c y) ly = Tj is the set of all 
and y-coordinates. This infinite set 


{(x, y) ly= t). Similar] 
(y) ly = 0}, an xi 
(yz 0 and y = 0}. The set {( 
Set of points whose z- and J-coordinates are both positive. "Thus, the 
graph of this set is the enti 

cated in Figure 1(b). 


We see that a relation (in the form of equaliti 
between z and y) can be Considered a sel-sel 
tures the set of those points (from among all in the plane) selected by 
the requirement that their coordinates satisfy the given relation. 
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Although it may seem strange at first, it turns out to be convenient 
to talk about sets that have no members. 


Definition 1.2. A set with no members is called an empty or null 
Set. 
The set {x e |? = —1} in Example 1.3 isan empty set. Another 
example is obtained by considering the set of all 
paths by which the line drawing of a house in 
Figure 2 can be traced without lifting one's pen- 
cil or retracing any line segment. Whether this ré 
set is empty or not is of some interest, since to Ee. 
assert that it is empty is to say that the figure D» 
cannot be traced under the prescribed condi- Zs, 
tions. (Let the reader convince himself that this Figure 2 
set is indeed empty.) As our work develops, we 
shall see many other less frivolous reasons for introducing the no- 


tion of an empty set. 
We conclude this ground-breaking section with one more definition. 


Definition 1.3. Two sets A and B are said to be equal and we write 
A = B if and only if they have exactly the same elements. If one 
of the sets has an element not in the other, they are unequal and we 
write A = B. 

Thus A = B means that every element of A is also an element of 
B and every element of B is an element of A. Equal sets are identical 
sets, and this identity is symbolized by the equality sign. 

"This definition has some interesting consequences. First, it is clear 
that the order in which we list the elements of a set is immaterial. 
For example, the set (a, b, c} is equal to the set (c, a, b}, since they 
do indeed have exactly the same three elements. 

Also, when sets are specified by defining properties, they can be 
equal even though the defining properties themselves are outwardly 
different. Thus, the set of all even prime numbers and the set of real 
numbers z such that z + 3 = 5 have different defining properties, 
yet they are equal sets, for each contains the number 2 as its only 
element. 

Up to now, we have been careful to speak of a set having no mem- 
bers as an empty set. But it is clear from Definition 1.3 that any two 
empty sets are equal. For to be unequal it is necessary for one of the 
sets to contain an element not in the other, and this is impossible 
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since neither set contains any elements. Therefore we are justified in 
referring to the empty set or the null set.* We denote the null set by 
the special symbol 9. 


PROBLEMS 


1.1. We list eight sets. For each set, state whether it is finite or infinite. 
Tf finite, count the number of elements in the set. Where feasible, write 
the set using the roster method. 


(a) The set of footnotes in Section 1. 

(b) The set of letters in the word “probability.” 

(c) The set of odd positive integers. 

(d) The set of prime numbers less than one million. 

(e) The set of paths by which the following figure can be traced without. 
lifting one’s pencil or retracing any line segment: 


3 


Figure 3 


(f) The set of those points Gn a given plane) that are exactly five units 
from the origin O. 


(g) The set of real numbers satisfying the equation z? — 3z + 2 = 0. 
(h) The set of possible outcomes of the experiment in which one card 
is selected from a standard deck of 52 cards. 


The following paragraph was written by a student impressed with the 
technical vocabulary of set theory. Rewrite in more usual English prose. 

Let C be the set of Mr. and Mrs. Smith’s children. C was equal to Ø 
until March 1, 1958. C contained exactly one element from that date 
until March 15, 1959 when it increased its membership by two! 


1.2 


* The following true story concerns the attempt of a well-known professor of 
mathematies to teach his five-year-old son the subtle distinction between “ 
(or an") and “the.” One day the son answered the telephone, listened a moment 
and then said, “I’m Sorry, but you have the wrong number." (Isn't this what 
most of us say when someone dials incorrectly?) The father, having overheard, 
immediately called the boy to him and gently instructed, “What you said would 
be correct if there were exactly one wrong number. But 


possible wrong numbers, it would be more accurate to sa; 
have a wrong number,’ ” 


a” 


since there are many 
y, Tm sorry, but you 
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1.3. To illustrate the inadequacy of displaying a few elements of a set and 
indicating the other elements by three dots, consider the set A of all 
numbers of the form 


n? + (n — 1)(n — 2)(n — 3), 


where n is any positive integer. Show that the first three elements 
(i.e., those obtained when n — 1, 2, 3) are 1, 4, and 9, so that one is 
tempted to write A = {1, 4, 9, -- -}. If A is written this way on an I. Q. 
test, we do not hesitate to write the next element as 16. But show that 
the next element (obtained when n = 4) is actually 22 and not 16! 
Indeed, it is possible to write a defining property for a set so that its 
fourth element (in order of magnitude) is any number, say 94, although 
its first three elements are 1, 4, 9. Formulate such a defining property. 


14. Let A = (0, 1, 2, 3, 4}. List the elements, if any, of each of the follow- 


ing sets: 

(a) (zeA|2: —4 = 0) (b) {zeA |x? — 4 = 0} 
(c) {re A |23 — 427+ 3c =0} (d) {red |z = 0} 
(e) {red |z +1 > 0) (f) {eeA|2e+1 < 0) 


(p {ceAla?— Se +420} h) fveAla*?—2 <0} 


1.5. Let x and y be the coordinates of a point in the plane. Identify the 

following sets and give a geometric interpretation of your results: 

(a) {(x, y) |£ +y = Sand 3x — y = 3} 

(b) £(z, y) |£ + y = Sand 2x + 2y = 3} 

() {zy |z +y = 5 and 2z + 2y = 10} 

Show that set equality has the following properties: 

G) Set equality is a reflexive relation; i.e., A = A for any set A. 

(ii) Set equality is a symmetric relation; i.e., for any sets A and B, if 
A = B, then B= A. : 

(iii) Set equality is a transitive relation; i.e., for any sets A, B, and C, 
if A = B and B = C, then A = C. 

(Nole: A relation that is reflexive, symmetric, and transitive is called 

an equivalence relation.) 

Determine whether A = B or A # B. 

(a) A = {2, 4, 6}, B = (406,2. 

(b) A = {1, 2, 3}, B = (Mars, Venus, Jupiter}. 

(c) A = (*|* is a plane equilateral triangle}, B = {+ | + is a plane 
equiangular triangle}. 

(d) A = {x| 22-27 +1 = 0}, B = íz|r—1-2 0). 

(e) A = {r| 2x2? — 5x + 2 = 0}, B = {x | 2x3 — 5x? + 2x = 0}. 


1.6 


Li, 


ll 
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1.8. Which of the following are true? Explain. 
(a) 2 = (2), (b) 2 e {2}, (c) 0 = Ø, (d) 0 ef. 


2. Subsets 


Each element of the set of vowels in the alphabet is, of course, an 
element of the set of all letters. Similarly, each number in (2, 4, 6) 
is an element of the set of all even integers, and each real number in 
{x |z > 3} is also in (z|z > 0). In this section, we discuss the 
simple but important relation between sets illustrated by these ex- 
amples. 


Definition 2.1. A set A is a subset of set B, denoted by A C B, 
if each element of A is also an element of B. We agree to call the null 
Set Ø a subset of every set. 


For example, we write (1,3) C (1, 2, 3), since each of the two 
elements in (1,3) belongs to {1, 2,3}. Also, (1,3) C (z|z > 1) 
and (1,3) C (1,3). The definition of subset implies that a set is a 
subset of itself; i.e., A G A is always true. We can express this fact 
using the language introduced in Problem 1.6 by saying that set 
inclusion (i.e., one set being a subset of another set) is a reflexive re- 
lation. It is also transitive, for if A C B and B C C, it follows that 
ACC. But set inclusion is not symmetric. As a counterexample, let 
A = {a} and B = {a,b}. Then A C B is true, but B C A is false. 

It is noteworthy that the definition of set equality in the preceding 


TABLE 1 


Number 
of 
Subsets 
of A 


Set A n(A) Subsets of A 


g 9 1(= 2) 


{a} 9, {a} 2 (= 2!) 
la, 0] 9, {a}, {b}, {a,b} 4 (= 22) 
{a, b, c] 9, {a}, {b}, fc}, {a,b}, {a,c}, (b, c], {a,b,c} 8 (= 2%) 
9, {a}, {b}, tc}, (d], (a, b], (a, c], 
{a, b, c, d] la, d}, (b, c], (b, d], (c, d], (a, b, c], 16 (= 24) 
fa, b, d}, (a, c, d], (b, c, d), (a, b, c, d] 
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section was formulated in terms of the subset relation. In fact, it is 
merely a restatement of Definition 1.3 to say that A = B if and only 
if A C B and B C A. 

Table 1 illustrates the notion of subset, and also directs our at- 
tention to a formula relating the number of subsets of a set to the 
number of elements in the set. We denote the number of elements in 
A by n(A). 

From the numbers in the last column of this table, we are led to 
conjecture that if 1 is any nonnegative integer, then a set with n ele- 
ments has 2" subsets. Before proving this result is true, we need to 
enunciate a principle that is at the heart of most counting procedures, 
and that is used time and again in computing probabilities. 


Fundamental Principle of Counting: 


(a) If one task can be completed in N, different ways and, following 
this, another task can be completed in N different ways, then both 
tasks can be completed in the given order in NN» different ways. 

(b) More generally, suppose à certain job can be done by com- 
pleting, in some specified order, n smaller units (which we shall call 
tasks), where n is any positive integer. The first task can be com- 


pleted in N; different ways. Having finished the first task, the second 


can be completed in Ne different ways. Having finished the first two 
4 different ways. And so on 


tasks, the third can be completed in N: 
until, having finished all but the last task, this nth task can be com- 


pleted in N, different ways. Then the entire job can be done in 
NNN; «++ N, different ways, it being understood that two ways of 


doing the job are considered differ- 
ent if and only if there is at least 
one task that is completed differ- 2 


ently in the two ways. 


The tree-diagram in Figure 4 à 
illustrates (a) for the special case 5 
N, = 3 and N: = 2. Starting from ; 


Some point, we draw N, = 3 lines. 
From each of these lines, we draw Figure 4 
Ny = 2 lines. The total number of . 
ways of completing task 1 and then task 2 is the same as the total 
number of branches in the tree. 


When there are only two tasks, as in (a), the fundamental princi- 
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ple follows immediately from the definition of multiplication. For 
each of the N: ways of doing task 2, we have N, ways of first doing 
task 1. Hence both tasks can be done in a number of ways equal to 
Ni + Ni + +++ + Ni, where there are Ne summands. But this num- 
ber is precisely the product N,N». 

The general principle in (b) can be proved by mathematical in- 
duction. We leave this for Problem 2.10 and proceed to illustrate 
how one uses the fundamental principle of counting. 


Example 2.1. We roll a green die and then a red die. How many 
ways can these dice come up? Our job can be thought of as recording 
the results of the two rolls. This can be done by first recording the 
number on the green die (task 1), and then recording the number on 
the red die (task 2). Task 1 can be done in six ways, and then task 
2 can also be done in six ways. Hence, there are 6 - 6 = 36 possible 
ways that the two dice can come up. 


Example 2.2. How many distinct three-letter “words” can be 
made, using the letters chosen from among those of “number,” but 
with no letter used more than once in a “word”? Our job is to con- 
struct a three-letter “word” under the prescribed conditions. This 
job can be done by selecting the first letter (task 1), then the second 
letter (task 2), and finally the third letter (task 3). Task 1 can be 
done in any of six ways, since there are six letters available in 
"number." Having chosen one letter, there are only five remaining 
letters, and hence only five ways of completing task 2. Similarly, 
there are four ways of completing task 3 after the first two letters are 
chosen. Hence there are altogether 6 - 5 - 4 = 120 different three- 
letter "words" that can be formed. 


Example 9.3. How many different four-of-a-kind poker hands are 
there? Our job is to select a hand (subset) of five cards from the 
ordinary deck (set) of 52 cards in such a way that the hand contains 
four cards with the same face-value. "This job can be done by com- 
pleting the following tasks in the stated order: (i) Choose one face- 
value from among the 13 possible face-values; (ii) Select four cards 
from among those with the face-value chosen in (i), paying no regard 
to their order; (iii) Choose one card from among the remaining 48 
cards. Each time we complete the job this way, we obtain exactly 
one four-of-a-kind poker hand. Moreover, different Ways of com- 
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pleting the job result in different four-of-a-kind poker hands. (It 
was to make this last assertion true that we described task (ii) as 
choosing a set of four cards, and were not concerned with the order in 
which the eards were selected.) Hence, there are as many different 
four-of-a-kind poker hands as there are different ways of completing 
the job. Now, task (i) can be done in 13 ways, task (ii) can be done 
in only one way (since there is only one set of four cards that can be 
formed from four given cards), and task (iii) can be done in 48 ways. 
Hence, there are 13:1: 48 = 624 different four-of-a-kind poker 


hands. 


We shall return to the fundamental principle of counting in Chap- 
ter 3, since it is the basic result from which the formulas of combin- 
atorial analysis are derived. Our main interest here is to use the 
principle to establish the following theorem. 

Theorem 9.1. Let n be any nonnegative integer. If A is a set with 
n elements, then there are 2" different subsets of A. 

Proof. If n = 0, then A = f and the only subset of @ is itself. 
Since 2° = 1, the theorem is true in this special case. If n > 1, then 
let the n elements of A be enumerated in some order. The job of 
constructing a subset of A can be viewed as made up of the following 
n tasks. As task 1 we decide whether the first element of A should or 


should not be an element of the subset. If we decide it should, then 


let us write down an e; if we decide it should not, then we write ¢. 


Then, as task 2, we write e or f, depending upon whether we decide 


that the second element of A should or should not belong to the sub- 
Set. Now we move to the third element of A and complete the third 
task in a similar manner. Since A has n elements, we complete n 
tasks, and thus obtain a sequence of n decisions, each symbolized by 
€ or e. For example, if A = (a, b, c, d}, then the sequence eefe deter- 
mines the subset (a, b, d), the sequence cece determines the subset A 
itself, the sequence ¢¢¢é determines the empty subset Ø. In general, 
there are as many subsets of A as there are different ways of making 
the n decisions. Since each decision can be made in two ways (eor ¢), 
we conclude by the fundamental principle of counting that there are 
2.2.2... 2= 2» ways of making all 2 decisions, and hence 2* 


subsets of A. 


Forming subsets of a giv 
number of new sets. In fac 


en set is a method that generates a large 
t, a set with 20 elements has 2? or more 
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than a million different subsets. Ordinarily, however, as the follow- 
ing examples point out, one is interested in studying a small number 
of subsets from among the many available ones. 


Example 2.4. Let a green and a red die be rolled, and let S denote 
the set of possible outcomes. S has 36 elements which we can enu- 
merate as follows, using the abbreviation (x, y) to stand for "green 
die showed the number z and red die showed the number y”: 


GD (1,2) (,3) (1,4) (1,5) (1,6) 
(2,1) (2,2) (2,8) (2,4) (2,5) (2,6) 


(6) (62 (63) (6,4) (55 (59 


By Theorem 2.1, there are 2? subsets of S. But relatively few of 
these subsets have any special interest, even to players of “craps.” 
Some of these are: 

(i) S, = the subset made up of those outcomes for which the sum 
of the numbers on the two dice is 7; i.e., 


Si = (0,6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}. 


(ii) S» — the subset containing the outcomes for which the sum of 
the numbers on the two dice is 11; i.e., 


So = {(5, 6), (6, 5)}. 


(iii) S; = the subset containing the outcomes for which the sum 
of the numbers on the two dice is either 7 or 11;ie., 


& — {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (5, 6), (6, 5)}. 
Whenever any experiment is performed (as in this example), we can 
think of the set of all possible outcomes of the experiment. We shall 


see that such sets and their subsets are of great importance in the 
mathematical theory of probability. 


Example 9.5. The annual directory of college X lists the name, 
hometown, college residence, and telephone number of each of the 
college's 2000 students. Let A be the set of these 2000 entries, each 
entry containing the four pieces of information described above. The 
total number of subsets of A is astronomical, being 22, But the 
housemother in a certain dormitory is mainly concerned with the 
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subset of those entries containing the names of women who are resi- 
dents of her dormitory, the mathematics department must estimate 
ahead of time the approximate number in the subset of entries naming 
students who will elect mathematies courses, a student may be es- 
pecially interested in the subset of entries naming all freshmen who 
come from his hometown, etc. If the information for each student is 
entered by punching holes on certain specially-designed cards, then 
the cards corresponding to these subsets of A, as well as many others, 
can be sorted out of the whole set of cards by a machine. In fact, 
such sorting machines are designed for the purpose of speedily select- 
ing certain subsets from a given set. 


We conclude with an example designed to test the reader’s grasp 
of the difference between the notions of set membership (symbolized 


by ©) and set inclusion (symbolized by C). 


Example 9.6. Consider the set M of majorities in a committee of 
four individuals, each having one vote. Let us label the individuals 
a, b, c, d, and note that a majority is itself a set of three or more of 
these committeemen. Thus, the set M has sets as elements, and we 
write M using braces within braces: 

M = ((a, b, e {a, b, d), {a, C, d}, (b, [7] d), {a, b, C, d}}. 
Thus {a, b, c} is an element of M, but although a set, it is not a subset 
of M. (Why?) But ((a, b, à) is a subset of M since its only element, 
{a, b, c), is indeed also an element of M. 


PROBLEMS 


mes of the experiment in which a green 
Example 2.4.) We define certain subsets 
State a defining property for each subset. 


2.1. Let S be the set of 36 outco 
and a red die are rolled. (See 
of S by listing their elements. 
(a) ((1, D, (2, 2, (3, 3), (4, 4), (5, 5), (6, 6) 
(b) {(1, D, (1, 2, (5 3), CL, 4), (1, 5), (1, 6) 


1, 3), (2, 2), (3, Dn) 
d) tr D. a 4), (1, 5), (1, 9, (2, 4), (2, 5), (2, 6), (3, 5), (3, 6), (4, 6)} 


2.2. Let A = (1, 2, 3}. Identify the sets B such that (1) C B, B C A, and 
BHA. 

2.3. You are told that there is only one set A such 
the set B. 


that A C B. Identify 
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2.4. 


2.5. 


2.6. 


2.7, 


2.8. 


2.9. 


2.10. 
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Let A be any set. The set (X | X C A} of all subsets of A is called 


the power set of A and is denoted by 24. (If A is a set with n elements, 
then Theorem 2.1 says that the power set 24 has 2^ elements. This fact 
accounts for the name “power set" and the symbol 24 used to de- 
note this set.) Explain the following true statements, assuming 
A= (z,y,z: 

(a) Ø € A, but Ø e 24, 

(b) ze A, but z €24, 

(c) (z, y) € A, but (z, y) € 24. 

(d) A is an element, but is not a subset of 24. 

(e) (4) is not an element, but is a subset of 24, 


Which of the following are correct and why? 

(a) {1} e(() (b) {1} € {{1}} 

(c) {1} e {1, (0) (d) {9} € {1, (03 

Give an example of two sets A and B such that both A € B. and 
A C B are true. 

The graph of the set of points C = ((z, y) | zz + y? = 4}, where z and 
y are real numbers, is the circumference of the circle with center at 


(0, 0) and radius 2 units. Determine the graphs of the following sub- 
sets of C: 


(a) (m y) eC |z = 0) (b) (my) eC |z = 2} 

(c) (5) eC |x = 3 (d) ((z, y) eC |z > 0) 

() (y eC ly > 0} O Ae, Cly = Vi- 2} 
(a) If ze A and A C B, is it necessarily the case that z e B? 


(b) If z € A and A e B, is it necessarily the case that z € B? 


Draw a tree diagram to illustrate the fundamental principle of counting, 
assuming n = 3 and N; = 4, N: = 3, N; = 2. 


Assume the truth of the fundamental principle of counting for n = 2, 
i.e., for a job made up of only two tasks. Prove the principle for any 
positive integer n by mathematical induction. 


In each of the following problems, state explicitly how the fundamental prin- 


ciple of counting is used in obtaining your answer. Draw a tree diagram where 
feasible. 


241. 


2.12. 
2.13. 


A man has five coins in his pocket. He agrees to give one coin to his 
son and one to his daughter. In how many ways can this be done? 


Inhow many different orders can one call out the numbers 1, 2, 3, 4, 5? 


In dialing a telephone number, one has to select seven slots, the first 
two for the letters of the exchange, and then five digits to identify the 
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2.14. 


2.15. 


2.16. 


2.17. 


2.18. 


2.19. 


2.20. 


telephone in that exchange. The telephone dial contains ten slots, one 
for each of the digits 0, 1, 2, ---, 9, but letters appear in only eight 
of these slots. If the first number cannot be a zero, how many diferent 
telephone numbers, distinguishable as dialed, are possible? 


How many three-digit even integers can be formed from the digits 
1, 5, 6, and 8, with no digit repeated? z 
How many different ways are there of selecting two letters from the 
set {a, b, c}? Let the reader realize that the question as stated is vague 
and needs to be made precise before it can be answered. We must know 
how the letters are selected, and we must decide when results of the 
selection process will be considered different. We list four possibilities. 
Answer the question in each case. 

(a) The first letter is chosen and the second is selected from the remain- 
ing two letters; i.e., repetitions are not allowed. We count two 
ways of making the selections different if they result in different 
ordered pairs of letters; i.e., we record not only which two letters 
were selected, but also the order in which they were selected. 

(b) The first letter is chosen and the second is selected from the entire 
set of three letters; i.e., repetitions are allowed. We count ordered 
pairs of letters as in (a). 

(c) Repetitions are not allowed, as in (a), and we count two ways of 
making the selections different only if they result in different sets 
of two letters; i.e., we disregard the order in which the letters were 


selected. 
(d) Repetitions are allowed, as in (b), 


nt ways are there of selecting two cards from a 
2 cards? Consider various interpretations of this 


and we disregard order as in (c). 


How many differe 
standard deck of 5 
question, as in the preceding problem. 


How many ways can three coins fall? four coins? m coins, where m 


is any positive integer? 

Two cards are drawn one after the other fri 
cards. In how many ways can one draw 

(a) first a spade and then a heart? 

(b) first a spade and then a heart or a diamond? 

(c) first a spade and then another spade? 

Repeat the preceding problem, assuming the first card is put back in 
the deck before the second is drawn. 

Let A = {1, 2,3, °°") 365}. (a) Two numbers are selected in order, 
each from the full set A. The result is an ordered 2-tuple, or ordered 
pair of numbers. How many are there? (b) Three numbers are selected 


om a standard deck of 52 
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in order, each from the full set A. The result is an ordered 3-tuple, 
or ordered triple of numbers. How many are there? (c) r numbers are 
selected in order, each from the full set A, where r is some positive 
integer. The result is an ordered r-tuple. How many are there? (Note: 
A general definition of an ordered r-tuple is given in Section 5.) 


2.21. (a) How many ways are there of placing three distinguishable balls 
into two numbered cells? Into three numbered cells? Into » num- 
bered cells? 

(b) How many ways are there of placing r distinguishable balls into 
two numbered cells? Into three numbered cells? Into » numbered 
cells? 


3. Operations on sets 


In any particular discussion of sets, it is necessary to define some 
fixed set of elements (called the universal Sel) to which we limit the 
discussion. This point has been eloquently made by Langer: 


In ordinary conversation, we assume the limitations of such a universe, 
as when we say: "Everybody knows that another war is coming,” and 
assume that “everybody” will be properly understood to refer only to 
adults of normal intelligence and European culture, not to babies in their 
cribs, or the inhabitants of remote wildernesses. For conversational pur- 
poses, the tacit understanding will do; but if the statement is to be chal- 
lenged, i.e., if someone volunteers to produce a person to whom it is not 
true, then it becomes important to know just what the limits of its appli- 
cability really are. Arguments of this sort have their own technique, by 
which the opposition marshals contradictory cases—in this example, 
persons who have no such knowledge—and the asseverator rules them 
out as "not meant" by his statement. The universe of ordinary discourse 
is vague enough so that this process can go on as long as the bellicosity of 
the two adversaries lasts. Logicians and scientists, however, take no 
pleasure in casuistry. Their universe of discourse must be definite 
enough to allow no dispute whatever about what does or does not belong 


to it.* 

The fixed universal set we shall denote by U. Once having decided 
on the universal set 4L for a particular discussion, all other sets in that 
same discussion must be subsets of U. But different universal sets can 
be used for different discussions. ^L may be a set of people in one 


problem, a different set of people in another, a set of numbers in yet 
another, etc. 


à Susanne K. Langer, An Introduction to Symbolic Logic, 2nd edition, Dover 
Publieations, Inc., 1953, p. 68. 
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We now define the three basic operations on sets. 


Definition 3.1. Let A and B be any subsets of a universal set U. 
Then 
I. The complement of A (with respect to ^L) is the set of elements 
of 4L that do not belong to A. The complement of A is denoted by A’, 
the particular universal set being understood from the context. In 
symbols, 
A' = {reu | xed}. 


II. The intersection of A and B is the set of elements that belong 
to both A and B. The intersection of A and B is denoted by A N B, 
which is read “A cap B” or “A intersection B.” In symbols, 


ANB = {r |z eA ande eB}. 


III. The union of A and B is the set of elements that belong to at 
least one of the sets A and B, i.e., to A or B. The union of A and B is 
denoted by A U B, which is read “A cup B” or “A union B.” In 
symbols, 

AUB = {x|veA or ve B}. 


A comment about the meaning of the word “or” in mathematics is 
in order here. This logical connective is ambiguous in everyday lan- 
guage, sometimes being used in the inclusive sense (in which ep or q” 
is taken to mean “p or q, or both p and g”) and other times being 
used in the exclusive sense (in which “p or q" means “p or q, but not 
both"), As we have explicitly indicated by our wording, the “or” in 
the definition of union of two sets is to be taken in the inclusive sense, 
in which it is synonomous with the legal use of "and/or." We adhere 
to the accepted mathematical usage and shall henceforth always so 
interpret the word “or.” The words “not,” “and,” and "or" are 
italicized in Definition 3.1, for they are the key words to remember 
in the definitions of complement, intersection, and union of sets. 

The following examples illustrate how one obtains new sets by 
applying the operations in Definition 3.1 to given sets. 


Example 3.1. Let the universal set U be the set of letters in the 
alphabet, and let A be the subset of vowels, and B the subset con- 


taining the first three letters, i.e- 


A = (a, e, 3,0, U}, B = (a,b, c. 


A 
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Then, by Definition 3.1, 


A’ = the set of consonants, B' 


(d, e, f, Lm 
AUB = (a,b ce 1,0, u}, ANB 


{a}. 


Example 3.2. Let the universal set U be the set of all residents of 
New York City. Let A denote the set of male New Yorkers, B the 
set of New Yorkers who live in the borough of Brooklyn, and C the 
set of baseball fans in New York who are rooting for the Dodgers to 
win the National League pennant. Then A' = set of female New 
Yorkers, B' = set of New Yorkers who do not live in Brooklyn, 
C' — set of New Yorkers who are not baseball fans rooting for the 
Dodgers, i.e., the set of New Yorkers who either are not baseball fans 
at all or, if they are baseball fans, are not rooting for the Dodgers to 
win the pennant, A (| B — set of male residents of Brooklyn, 
A UB = set of New Yorkers who are male or Brooklynites, and 
B (| C = set of Brooklynites who are also baseball fans rooting for 
the Dodgers to win the National League pennant. (It was erroneously 
asserted by some bitter elements of Set B that, when the Dodgers 
moved to Los Angeles, it would be true that B N C = 9.) 


Suppose subsets A, B, C of a universal Set ^L are given. Since A’ 
and BAC are themselves Sets, we can form their intersection 
A’ N (B N C), the set of all elements in U that do not belong to 4 
but do belong to both B and C. Similarly, we can take the comple- 
ment of the intersection B N C, symbolized by (B N C)’, and thus 
obtain the set of objeets in 4L that are not in both B and C, i.e., that 


Example 3.3. Let at = {1, 2, 3, 4, 5, 
consider the subsets of U given by 


A= {1, 2, 3}, B= {2, 4, 6}, C= {1, 3, 5, 7}. 
By applying Definition 3.1, we find 


» 6, 7} be the universal set, and 


4 4567, B= 03,57 =¢, c. {2, 4,6} = B, 
AUB = (1,2,3,4, 6}, 4AUC= 1,235.7, B UC= W 
4nB^f), ANC= 03, BOocÓ 
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We can continue forming complements, unions, and intersections of 
these sets. For example, 
(A'Y = {4, 5,6,77 = {1, 2,3} = A, 
(A U By = fi 2, 3, 4, 6j" E 15, 7), 
BAO 2 =, 
(4 A B) UC = {2} U (1,8, 5, 7} = {1, 2, 3, 5, 7}, 
(AUQNANC) = U, 2,3, 5, 7} N 0,3) = {1, 3}, ete. 


When considering sets and operations on sets, it is helpful to repre- 
sent the sets pictorially. A rectangle is drawn to represent the uni- 
versal set u. A subset A of U is represented by the region within a 
circle drawn inside the rectangle. Then A’, the complement of A, 
will be represented by the part of the rectangle outside the circle, as 
in Figure 5. 


Such diagrams, called Venn diagrams (after the English logician 
John Venn, 1834-1883), can be drawn for a problem involving two 
subsets, say A and B, of some universal set U. In Figure 6, we have 
labeled the four nonoverlapping regions of the rectangle correspond- 
ing to the following four possibilities for any element z eL: 

(1) zeA and zeB, ie, zeA(1B (Region R) 
(2) zeA and z£B, ie, zeA(1B' (Region Re) 
(3) xed and zeB, ie, zeA'(1B (Region Rs) 
(4) zeA and z£B, ie, zeA’ N B' (Region R4) 

It is important to observe that complements, unions, and inter- 
sections of the sets A and B can be represented by combinations of 
more of the regions in Figure 6, as in Table 2. Furthermore, 
in terms of operations on A and B, we can 
artieular combination of regions representing 
contains those elements that 


one or 
given any set written 
easily determine the p 
this set. For example, the set (4 N B) 
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are not in A N B, i.e., not in region R;. Hence, (A N B)' is repre- 
sented by region Rə & R; & Ry. 

Suppose we are told that A C B. Although the Venn diagram is 
often drawn in this case with the 
circle representing A entirely with- TABLE 2 
in the circle representing B, we 


prefer to use Figure 6. Put since Rapar A EEEREN 
we are told that every element in R, & Ra & Rs & Ry 
A is also in B, we conclude that R& Rs 

region Rz represents f), i.e., A Q B^ R& Ra 

= f. Furthermore, the regions » & B, & 

R,& R: and R, must now repre- R i 
sent the same set of points, i.e., pe n. 
A=A()B. Similarly, Ri & Rs Ra & Ry 

and R,& R:& R; also represent 


equal sets, so that B = A U B. In 
this way, we see that the following are all equivalent assertions, each 
giving the information that every element of A is also in B: 


(1) ACB, (2 ANB =4, (3) 4=ANB, (4) B=AUB. 


In order to consider another application of Venn diagrams, we 
need to make an important definition. 


Definition 3.2. Two sets A and B are said to be disjoint or mutually 
exclusive if they have no elements in common, i.e., if A N B = f. 


When A and B are disjoint, one customarily draws a Venn diagram 
with nonoverlapping circles representing A and B. But we can 
equally well use the diagram in Figure 6, provided we note that re- 
gion R, represents the empty set. 


Example 3.4. If S is any set, let us denote by n(S) the number of 
elements in S. If A and B are disjoint sets, then the number of ele- 
ments in A or in B is the sum of the number of elements in A and the 
number of elements in B; i.e., 


(3.1) n(A U B) = n(A) + n(B) if A(1B - f$. 


To find a formula for n(A U B) when A and B are not necessarily 
disjoint, we proceed as follows. A N B' and A N B are disjoint sets 
(why?) whose union, as is easily seen from Figure 6, is the set A. 
Hence by (3.1), 


n(A N B^) + n(A N B) = n(A). 
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Also, since A N B and A’ N B are disjoint sets whose union is B, 
n(A N B) + n(4' N B) = n(B). 


If we add these equations and subtract n(A (^) B) from both sides of 
the result, we obtain 

n(A ( B^) + n(A N B) + (4! B) = n(4) + 2(B) — «(4 N B). 
But, referring to Figure 6, we recognize the left-hand side of this 
equation as the number of elements in the set represented by region 
R, & R: & Rs, But this region represents the union A U B. Hence, 
We obtain the formula 

(3.2) n(A U B) = n(A) + n(B) — n(A N B), 

which is valid for any sets A and B. Note that (3.2) reduces to (3.1) 
when A and B are disjoint, for then 
n(A N B) = n(f) = 0. 

Suppose we pick one card from a 
standard deck of 52 cards. Let U be 
the set of 52 possible choices, and let 4 
and B denote the set of aces and the 
Set of spades, respectively. Obviously 
n(A) = 4andn(B) = 13. But n(A U B) 
=% 17, since A and B are not disjoint. 
Indeed, n(A N B) = 1, since only the 
ace of spades is common to A and B. By Figure 7 
(3.2) we correctly find n(A U B) = 16 
As expected, we obtain an ace or a Spa 


eme eee ON "00 
“Ole ue 


de with 16 different cards. 


n the case of three subsets A, B, and C 
have labeled the eight nonoverlapping 
g eight possibilities for any ele- 


. The general Venn diagram i 
1$ found in Figure 7, where we 
regions corresponding to the followin 


ment r eaul: 

(1) zeA and zeB and zeC (Region Rı) 

(2) zeA and zeB and z¢C (Region Rs) 

(3) red and zB and zeC (Region Ra) 

(4) zceA and zeB and «¢C (Region Ra) 

(5) zeA and weB and «eC (Region Rs) 

(6) zeA and ceB and zeC (Region Rs) fis» Wen 
(7) $64 and z«B and seë (Region R:) fo Mn 
(8) zeA and z£B and Tf C (Region Rs) [fe ON 


| 
2 

M Calcutta 
[8 opo: 
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It will be helpful in our later work to have clearly in mind the cor- 
respondence between various subsets of U and regions of the Venn 


diagram in Figure 7. The reader should check the examples in 
Table 3. 


TABLE 3 


Region in Figure 7 


Rı & R: & R; & R, & R; & R KR, & Rs 
Rı & Ri & Ri & R, 
Rı & R: & R; & Rs 
Rı & R; & R; & Rr 
ANB R&R, 
AUB Rı & R: & R; & R; & R; & Re 
(AUB)nC R,& R;& Rs 
A’ Rs & Re & Rz; & Rs 
A'n(An B) None (the set is empty) 
BNC Rı & R; 
(4nB)nc' R; 


We see that by starting with two or more subsets of some universal 
set and forming their complements, unions, and intersections, many 
other subsets are obtained. In the next section, we explore certain 
interesting and important relationships among these subsets. We 
conclude here with another example in 
which a Venn diagram proves helpful. 


Example 3.5. Persons are classified 
according to blood type and Rh quality 
by testing a blood sample for the pres- 
ence of three antigens: A, B, and Rh. 
Blood is of type AB if it contains both 
antigens A and B, of type A if it contains 
A but not B, of type B if it contains B 
but not A, and of type O if it contains 
neither A nor B. In addition, blood is 
classified as Rh positive (--) if the Rh antigen is present, as Rh 
negative (—) otherwise. If we let A, B, and Rh denote the sets 
of people whose blood contains the A, B, and Rh antigens respec- 


Figure 8 
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tively, then all people are classified into one of the eight categories 
indicated in the Venn diagram of Figure 8. 

Suppose a laboratory technician reports the following statistics 
after testing blood samples of 100 people: 


50 contain antigen A 
52 contain antigen B 
40 contain antigen Rh 
20 contain both A and B 
13 contain both A and Rh 
15 contain both B and Rh 
5 contain all three antigens 


How many persons of type 4— did the technician find? To an- 
Swer this sort, of question, we use the data to fill in the number of 
People in each of the eight subsets in Figure 8. The trick here is to 
use the data in reverse order, i.e., work from the bottom of the list 
to the top. Thus, the last item reported tells us there are five people 
of type AB-L. The 15 people reported to have both B and Rh must 
be of type AB-- or B+. Since five people are already identified as 
type AB-+, we infer that ten people are of type B+. In this way we 
complete the enumeration, and thus obtain the number of people in 
each of the eight categories. We find there were 22 people of type 


PROBLEMS 


34. Let u = {a,b,c}, A = {a}, B= {b}. List the elements of the follow- 
ing sets: A’, B', AU B, AM B, A! ^ B', A! (Y (AU B). 


3.2. Refer to the Venn diagram in Figure 6. Determine the region or com- 
bination of regions representing each of the sets (a) (AU B), 
(b) A’ U BY, (c) AUB, (d) (AY, ©) (A' ^ BYU B. 

3.3. A universal set U has eight elements corresponding to the eight pos- 
sible outcomes of the experiment in which a penny, a nickel, and a 
dime are tossed. 

(a) List the elements of U. 

(b) Suppose subset A contains those elements corresponding to out- 
comes for which the penny falls heads, subset B those for which all 
three coins match, and subset C those for which the number of 

heads exceeds the number of tails. List the elements of the follow- 

ingocior a’, BA UB, AM 0B OA TB, AGB PG, 

ANC, (A A B) O C, and (A Y B) (^ C. 
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3.4. The universal set U contains the 52 cards in a standard deck. Let S 
denote the subset of spades, D the subset of diamonds, and H the subset 
of honor cards (i.e., ten, jack, queen, king, or ace.) 


(a) Identify the following sets and count the number of elements in 
each: SH, S', DAS, DAS, DUS, (SUD) OH. 

(b) Write the following sets in symbolic form as in (a): the set of cards 
that are not honor cards, the set of cards that are neither spades 
nor honor cards, the set of clubs or hearts that are not honor cards. 


(Note: It is instructive to try to write this last set in at least three 
different ways.) 


3.5. Table 4 classifies 321 union men with respect to two characteristics: 
(1) the number of years each has been in the union, and (2) his answer 


to the question, “Are you willing to picket to help some other shop 
get organized or get a raise in pay?". 


TABLE 4 


Number of Years in the Union 
Response to Question Total 


Less than 1 1-3 4-10 Over 10 


BE necu x 27 54 137 28 246 
NU sscssazwas 14 18 34 3 69 
Don't know .... 3 2 1 0 6 

Total: 44 74 172 31 321 


[Source: Arnold M. Rose, Union Solidarity, University of Minnesota Press, 1952, p. 77.] 


Let the 321 men in this survey be the elements of our universal set AL, 
and define the following subsets of U: 


Y — set of men who answer “yes,” 
N = set of men who answer "ng; 


A = set of men who are in the 
B 


C 


union less than one year, 
= set of men who are in the union 1-3 years, 


= set of men who are in the union 4-10 years. 


(a) Find the number of men in each of the following sets: (i) FOB, 
Gi) Y U B, Gi) (Y U NY A, Gv) N A CY. 

(b) Write each of the following sets, using only the symbols A, B, C, 
Y, N,, U, and (^A. [Example: the set of men who answer “yes” 
and are in the union less than four years is the set Y A (A U B).] 
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(1) the set of men who answer “yes” and are in the union 4-10 
years. 

(2) the set of men who answer “yes” and are in the union at least 
four years. 

(3) the set of men who answer “don’t know.” 

(4) the set of men who answer “don’t know” and who are in the 
union over ten years. (What does the survey tell you about 


this set?) 


3.6. Let U be the set of all points in the plane. Relative to some fixed 


3.7 


3.8. 


3.9. 


rectangular coordinate system, we can write 

u = ((z, y) | re R and ycR), 
where R is the set of all real numbers. Let subsets of U be defined as 
follows: 


A = ((z y) |x z 0), B = {(z, y) |y 29}, 
€ = (x,y) |£ + 2% < 6), D = {(z,y) |y — z 20}. 


Sketch graphs of the sets (a) A, (b) B, (c) C, (d) D, (e) A A B, 
(D (A Y B) OG, (9 ((A A B) (1 C] D. 


(a) Express n(U), the number of elements in the universal set U, in 
terms of n(A), the number of elements in subset A, and n(A^), the 


number of elements in the complement of A. 
(b) The formula you wrote in (a) can be deduced from Formula (3.2) 


of the text. Show how to do this. 

A psychologist ran 50 mice through a maze experiment and reported 
the following data: 25 mice were male, 25 were previously trained, 20 
turned left (at the first choice-point), 10 were previously trained males, 
4 males turned left, 15 previously trained mice turned left, and 3 
previously trained males turned left. Draw an appropriate Venn dia- 
gram and determine the number of female mice who were not previously 
trained and who did not turn left. 

Of 63 member colleges of the Council For The Advancement of Small 


Colleges, Inc., 24 were founded before 1931, were coed, and reported 


annual student costs of less than $1,000; 41 were founded before 1931 


and were coed; 27 were founded before 1931 and had student costs of 


less than $1,000; 45 were founded before 1931; 52 were coed; 34 had 
student costs of less than $1,000; 4 were not founded before 1981, 
were not coed, and had student costs of at least $1,000. (Data reported 
in Supplement Section 11, The New York Times, October 11, 1959.) 
attend a coed college that is relatively 


A high school senior wants to 
new, say founded 1931 or later. His annual student costs must be less 


than $1,000. How many of the 63 small colleges meet his requirements? 


26 
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Show that each of the following pairs of sets are represented by the 
same region in the Venn diagram of Figure 7: 


(a) (A U BY and A'N P’, 

(b) (4 N BY and A! U B', 

() (4U B) U C and A U (BU C), 

(d) AN (BUC) and (A (^ B) U (A ^ C). 


. A, B, and C are subsets of a universal set U. Arrange the following sets 


in sequential order so that each set in the sequence is a subset of the 
next set: A U B, ù, ANB, Ø, B, AU (BUCO), (ANB) AC, 
(4 V B) UC, f', BOA. 


. (a) Show that W = Ø and Ø' = U. 


(b) Show that if A C B, then B' C A’. 


(c) Suppose A U B = Ø. What conclusion can you draw about the 
sets A and B? 


(d) Suppose A /^ B = Ø. Does it follow that A = Ø or B = Ø? 


. Let U be the set of all people and 


M = the set of all males, 

C — the set of all college students, 
I — the set of all intelligent people, 
S — the set of sorority members, 

B = the set of beer drinkers, 

P — the set of professors, 

W = the set of well-dressed people. 


Translate each of the following sentences into an equation or an in- 
equality using only the letters standing for sets and the symbols =, 
75, Ø, ', Y, VJ. (For example, the sentence “All college students are 
intelligent" means that the set of college students is a subset of the 
set of intelligent people, i.e., C C J. But we are not permitted use of 
the set inclusion symbol. Hence we refer to the discussion on p. 20 
and rewrite the sentence in any of the equivalent forms C O I = C, 
CUI-I,orC(^I' = f. Similarly, the sentence “Some college stu- 
dents are intelligent" means there is at least one member of the inter- 
section C (^ I. Hence this sentence is translated into C (AI # Ø.) 


(a) All professors are beer drinkers. 

(b) No males are sorority members. 

(c) No male college student is well dressed. 

(d) Sorority members are neither intelligent nor male. 
(e) Some professors are beer drinking males. 

(f) Some professors who drink beer are not males. 
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(g) Some professors who drink beer are neither intelligent nor well 
dressed. 

(h) College students and professors are beer-drinkers. 

(i) If a person is a beer-drinker, then that person is intelligent. 

(j) If a person is intelligent, then that person is a beer drinker. 

(k) A person is a beer drinker if and only if he is intelligent. 


Read the following discussion carefully. Then to test your understand- 
ing, do the exercises. 

‘A finite set U of n people is given. U will be called a decision-making 
body. Let U = 2! be the power set of U, as defined in Problem 2.4. 
V is a subset of U and will be called a coalition. (In 


Each element of 
t U itself are coalitions. There 


particular, the empty set Ø and the se 
are 2" coalitions altogether-) 


We select a subset W of U and write U = W U W', where 


W’ is the complement of W with respect to U. Since W and W' are 
disjoint, each coalition is in exactly one of W (the set of winning coali- 


lions) or W' (the set of nonwinning coalitions). 
Now consider the set W’. An element of W’ will be called a losing 


coalition if its complement (with respect to a) is a winning coalition. 
Thus L, the set of losing coalitions, is defined by 


L={A|AeW' and A' W}. 


[Note that W’ means the complement of W with respect to U, whereas 
A’ means the complement of A with respect to 4L. This confusion arises 
from the fact that U serves as universal set for all sets whose elements 
are people, whereas U serves as universal set for all sets whose elements 
are coalitions (sets of people).] 

Finally, a coalition that is nonwinning itself and whose complement 
is also nonwinning is called a blocking coalition. Thus B, the set of 


blocking coalitions, is defined by 
B= (A|AcW' and A'e W}. 


bodies contain important persons who get 
to be a dictator if {x} e W, i.e., 
oalition. A person y € U is said 
the sole member of a blocking 


Some decision-making 
special names: A person z €U is said 
if z is the sole member of a winning C 
to have veto power if (y) €B, ie. if y is 
coalition. 

In each of the following exercises, & particular decision-making body 
is described and its voting rules specified. I nterpret a winning coalition 
to mean a set of persons who control enough votes to carry a proposal. 
Find all winning coalitions, losing coalitions, blocking coalitions. De- 
termine if any members are dictators or have veto power. 


28 SETS / Chap. 1 


Exercise 1. A committee consists of four people, each with one vote. 
Majority rule applies; i.e., three votes are needed to carry a proposal. 


Exercise 2. A small corporation with 100 shares of stock outstanding 
has three shareholders. Individual a owns 50 shares, b owns 30 shares, 


and c owns 20 shares. Each share has one vote, and simple majority 
rule applies. 


Exercise 3. Same as Exercise 2, except that b has sold one of his 
shares to a. 


Exercise 4. A student-faculty committee consists of five students and 
four faculty members. For a proposal to be passed on to the entire 
faculty for its consideration, at least three students and three faculty 
members must vote for the proposal. Each member has one vote. 


4. The algebra of sets 


We have studied a number of ways of obtaining other sets, once a 
universal set U is given. There are the many sets that can be con- 
structed by performing the operations of complement, union, and 
intersection on subsets of ù. The reader must suspect by now 
(especially in view of the results in Problems 3.10-3.12) that there are 
many relationships among the sets obtained in this way. These re- 
lationships form the subject matter of the present section. 

We begin by listing a number of important laws obeyed by sets. 
All follow from our definitions of the empty set Ø, the universal set 


4L, the operations denoted by ', N, and U, together with the definition 
of set equality. 


Theorem 4.1. Let A, B, and C be any subsets of a universal set U. 
Then the following laws hold: 


Identity laws: 


la. AUB=A lb ANUWU=A 
22. AUU =U 2b. AN9 2 9 
Idempotent laws: 

3a. AUA=A 3b AN A=A 
Complement laws: 

4a. AU A' =U 4b. AK) A’ 2 6 


5a. (A) = A 
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Commutative laws: 


60. AUB=BUA 6b. AN B=BNA 
De Morgan’s laws: 
7a. (AU B)! = A' NB’ 7b. (AM By = AU B 


Associative laws: 


8a. AU (BUO = (AU BUC 

8b. AN (BAC) = (ANB) NC 
Distributive laws: 

9a. AU (BNC) = (AU B) (Y (A U O) 

9b. AN (BUC) = (AQ B)U (4 YO 

Before proving these laws, let us note that we are familiar with 
many of their names from the ordinary algebra of numbers. Thus, 
addition and multiplication of numbers is commutative, i.e. 


at+b=b+a and axb-bxa, 


for any numbers a and b. Analogously, Laws 6a and 6b assert that 
the order in which two sets are written does not affect their union 
or intersection. For any numbers a, b, and c, we recall the associative 
laws 
a+ (b+c) 2 (a+b) +e and aX bXc)= (a X b) X c. 

8a and 8b is clear. The associative law 8a asserts 
d if we form the union of A with the union 
he union of the union of A and B 
have only one distributive law, 
x c. This is analogous to 9b, one 


The analogy with 
that the same set is obtaine! 
of B and C or if we form instead t 
With C. In ordinary algebra, we 
namely, a X (b +c) 2a X bra 
of the two distributive laws for sets. 

Since adding zero to any number yields that same number as sum, 
0 is called an identity number with respect to addition. Similarly, 
since a X 1 = a for any number a, we say that 1 is an identity num- 
ber with respect to multiplication. As Laws la and 1b show, the 
empty set Ø is an identity set with respect to union and the universal 
Set 4L is an identity set with respect to intersection. 

Because of these analogies, 4 U Bis sometimes called the logical 
Sum and A N B the logical product of the sets A and B. But the 
analogy with ordinary algebra is not perfect, as a glance at the idem- 
Potent laws shows. lf a is a number, then a + a = 2a; if A is a set, 


then A UA =A. 
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It is instructive to try to translate these laws into prose form. For 
example, 7a asserts that the complement of the union of any two 
sets is equal to the intersection of their complements, and 7b asserts 
that the complement of the intersection of any two sets is equal to 
the union of their complements. 

Finally, let us note that (except for 5a) the laws in Theorem 4.1 
are listed in pairs, 1a and 1b, 2a and 2b, etc. We shall comment on 
the significance of this fact after we discuss the proof of these laws. 


Our method of proof involves the use of membership tables. The 
basic membership tables for complement, intersection, and union 
appear in Tables 5-7. In the first column of Table 5, we symbolize 
the two possibilities for any element z of the universal set U: either 


TABLE 6 TABLE 7 


TABLE 5 


zeA orzeA. If re A, then z e A' and if z£ A, then xe A’. These 
facts follow from the definition of complement and are summarized 
in the two rows of Table 5, the membership table for A'. 

With respect to the sets A and B, each element x of the universal 
set U falls into exactly one of the following categories: (1) z e A and 
ve B, (2) z e A and z e B, (3) x ¢ A and x e B, and (4) x € A and z ¢ B. 
"These are the four possibilities symbolized in the four rows to the left 
of the double vertical line in Table 6, the membership table for 
A (^) B. To the right, in the column headed A N B, is summarized 
the membership status of x with respect to the intersection of A and 
B. That is, by the definition of intersection, z e A N B in the case 
(row 1) when z e A and z e B; £ € A N B in all other cases (rows 2-4). 

Table 7, the membership table for A U B, is similarly interpreted. 
We know that z € A U B if and only if z belongs to at least, one of 
the sets A and B. Hence, an e appears under A U B in rows 1-3 of 
Table 7, but an ¢ appears in row 4. 

Using these basic tables, we can construct membership tables for 
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other sets. The details of this construction, as well as the rationale 
behind the use of membership tables for proving equality between 
sets, are best explained in the context of some examples. 


Example 4.1. To prove De Morgan's law, 
(AU B) - A' E, 
we proceed as follows. Since this law involves two arbitrary sets A 
and B, we start by listing in columns (1) and (2) of Table 8 the four 


TABLE 8 


(3) (4) (5) (6) (7) 


a) (2) 


AUB (AU BY A’ B' A'n B' 


possibilities for an element z e U. Since the set (4 U B)' is obtained 
by first forming A U B and then taking its complement, we include 
a column for A U B and another for (A U B)'. The entries in column 
(3) are obtained from (1) and (2) by use of the basic membership 
table for A U B. The entry in each row of column (4) is obtained 
from the entry in the corresponding row of column (3) by using Table 


in the law we are trying to prove is 
“and then taking their intersection. 
Table 8. The entry in each row 
entry in the corresponding row of 


The set A’ N B’ appearing 
obtained by forming A’ and B 
Hence we have columns (5)-(7) in 


of column (5) is obtained from the 
column (1) by use of the basic membership table for A'. Column (6) 


is similarly obtained from column (2). Finally, from (5) and (6) we 
£et column (7) by using the basic membership table for intersection. 

The crucial observation is that columns (4) and (7) are identical: 
whenever a row contains an ¢ in column (4) it also contains an e in 
column (7) and likewise for the occurrences of ¢. We conclude that 
whenever an element of 4L belongs to (A U BY’, it also belongs to 


A' A B’, ie, 
(4.1) (AUBY € 4'n B. 
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Moreover, whenever an element does not belong to (A U B)', then 
it does not belong to A’ (^) B’. It follows (why?) that every element 
that does belong to A’ N B’ must also belong to (A U BY, i.e. 


(4.2) A ry B^ €. (d BY: 
From (4.1) and (4.2) we conclude that 
(AU B) = A' QB’ 
and this completes the proof of De Morgan’s law 7a. 


Before considering another example, we note the striking similarity 
between the method of proof just illustrated and the method of 
verifying relations between sets by means of Venn diagrams. In 
Figure 9, we apply the latter method to the De Morgan law we have 


U:Rı& R&R, &Rı, A: R&R, B: R&R; 


AUB: R; & Rz & Rz A': R3 & R, 


(AU B)' : Ry B': R2 & R4 


A'NB': Ry 


Figure 9 


just proved. In the space above the usual rectangle, we list our data: 
the universal set U is represented by the entire rectangle, the set A 
by the region Rı & Rs, the set B by the region R, & Rs. (We use the 
colon as shorthand for “is represented by" when it separates a set 
and a region in the Venn diagram.) To the left of the rectangle, we 
list the steps required to find the region represented by the left-hand 
side of De Morgan’s law; to the right of the rectangle, we find the 
region represented by the right-hand side of the law. We observe 
that (A U BY and A’ N B' are both represented by the same region 
Ra. This fact constitutes the verification of De Morgan’s law 7a by 
means of a Venn diagram. 

Since the four regions in Figure 9 are numbered to correspond to 
the four rows of Table 8, we can follow in the Venn diagram each 
step in the construction of Table 8. Thus, the fact that column (1) 
contains e's in rows 1 and 2 is expressed in Figure 9 by the fact that 
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A is represented by Ri & Re. That columns (4) and (7) are identical 
and eontain e's only in row 4 is expressed in Figure 9 by the fact that 
both (A U B) and A’ N B’ are represented by region Ry. Although 
the method of membership table suffices for proving all the laws we 
shall encounter, the method of Venn diagrams is often a helpful aid 
in understanding these laws. We give one more example in which 
both methods are used. 


Example 4.2. To prove the distributive law 9b in Theorem 4.1, we 
construct the membership table with eight rows (since three arbi- 
trary sets are involved) in Table 9. The law is proved by noting that 
the columns headed A N (B U C) and (4 N B) U (4 N C) are 
identical. 


TABLE 9 
A B clauc anq«guc ana Ane (ANB)U(ANC) 
€ € € € € € € € 
€ € € € € € € € 
e f € € € € € € 
e $9 € ¢ £ € £ f 
€ € € € £ ¢ ¢ £ 
Pow y € d ¢ f € 
f € € € £ € £ € 


In Figure 10, this same distributive law is verified by using an 


appropriate Venn diagram. We have numbered the eight regions in 


SERERE Ci R&R & Rsh R) 


U: R, aR, GR, & R, & Ry & Re BR ER AR, & Rg & Ra & Res 
ANB: R & R2 


BUC: R; & R; & Ry & Rs & Re & R7 


AD (UO : Ri & Rz & Ra ANC: Ri & R3 


(ANB)U(ANG : Ri & Ra & Ra 


Figure 10 


e eight rows of Table 9 in order to 


Figure 10 to correspond to th B or un t 
4.1, the similarities m the membership 


bring out here, as in Example 
table and Venn diagram methods. 
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We leave the proofs of the other laws in Theorem 4.1 as problems 
for the reader. In the next examples, we illustrate how to prove still 
other laws directly from those already known to be true. 


Example 4.3. If A and B are any subsets of a universal set ù, then 


(4.3) A = (A B) U (A n B^). 
Proof. 

(AN B)U(ANB) =AN (BU B) [by 9b.] 
=AN4U [by 4a.] 
=A [by 1b.] 

Example 4.4. If A and B are any subsets of a universal set U, then 
(4.4) A = (AU B) (1 (AU D^). 
Proof. ; 

(AUB)N AUB) =AU(BNB’) [by 9a.] 
-AUP [by 4b.] 

A 


[by 1a.] 


These examples enable us to make a point concerning the pairing 
of the laws in Theorem 4.1. This was done in order that we may note 
the so-called duality principle: If in any law we replace Ø by u, ài by 
9, U by N, and N by U wherever they occur, then the result is again 
a law. The new law is said to be the dual of the original law. Thus in 
Theorem 4.1, law 1b is the dual of law 1a, la is the dual of 1b, and 
so on for all a and b laws in our list. The dual of 5a is itself ; law 5a is 
therefore said to be self-dual. 

Note that (4.3) and (4.4) are dual laws, and that the proof of (4.4) 
can be obtained from the proof of (4.3) by replacing each statement 
by its dual. Since Theorem 4.1 contains the dual of every one of its 
laws, we can justify each step in proving (4.4) by appealing to the 
dual of the law justifying the corresponding step in the proof of (4.3). 
In this way, we could prove the dual of any law whose proof followed 
from Theorem 4.1. Indeed, this is the essence of the duality principle. 

The importance of the duality principle cannot be fully appreciated 
until the algebra of sets we have been discussing is treated formally 
as a mathematical system. In this more abstract study, known as 
Boolean algebra (after the English logician George Boole, 1815-1864), 
the algebra of sets becomes just one concrete interpretation of an 
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abstract system which also has other important interpretations.* 
The interested student can consult the references listed at the end of 
this chapter for readings on Boolean algebra. 

In our later work, we shall need to consider the union and inter- 
section of more than two sets. Because of the associative laws 8a and 
8b in Theorem 4.1, it is not necessary to use parentheses to show how 
more than two sets separated by U or by N are paired. For example, 
AMBQCQD can be interpreted as (A N B) A (C A D) or as 
(A (BY 0) (1 D or as (AN B) A C) MD, since all of these 
Sets are equal. (See Problem 4.9.) Similar considerations apply to 
the union of more than two sets, so we make the following general 
agreement. 

Definition 4.1. Let » be any positive integer and suppose Bi, 
B, -.., B, are given sets. Then the set of elements belonging to 
all the given sets is denoted by 

Bi Beh) *= N Bs 


and the set of elements belonging to at least one of the 
denoted by 


given sets is 


BUBU eae U Ba. 
can now be generalized to hold 


Man in Theorem 4.1 
treaties soe co than two sets. We collect some 


for unions and intersections of more 
of these formulas in the following theorem. 


Theorem 4.9. Let n be any positive integer and suppose A, Bi, B» 
--, B, are subsets of a universal set u. Then 

(4.5) (BUBB e UA) = BOBO c 0B 

UO (En NBF = BLU BU + UB. 


(4.7 Ay ees (Bo) 
) AU By AUBIN LU B). 


=(A 
(4.8 QU UB) 
) AN BU BY BUA NB) Us UAN B2. 


Proof. We prove (4.5) and leave the others as problems. Our 

* The so-called statement calculus in logic is another ss ac inge of m 
Algebra, and we can therefore expect that the logical analysis of sta E M ^ 
the study of sets will have many common features. The i e " E A se 
Use of truth tables in logic and our use of membership tables is but one y 
examples, 
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method of proof is by mathematical induction. When n = 1, both 
sides of (4.5) reduce to Bi, and so (4.5) is certainly true. If n = 2, 
then (4.5) reduces to De Morgan's law 7a in Theorem 4.1 (applied 
to the sets B, and B»), and hence is again true. 

Now suppose (4.5) is true when n = k, where k is any positive 
integer. We complete our proof by mathematical induction if we 
show that (4.5) must also be true when n = k + 1. By grouping the 
sets as indicated, we obtain 


(BLU B&U +++ U Bay = (PU B U --- U By) U Brail 


Applying De Morgan’s law 7a to the sets (B; U Bs U :-- U Bj) and 
Bia, we get 


(B: U Be U +++ UB) = (Bi U Bo U +++ U Bi)! (Y Bisi 
= (BLN BA +++ O BY A Bis, 
the last equality following by our induction hypothesis that (4.5) is 
true when n = k. But the parentheses are not necessary in this last 
expression, so we can write 
(Bi U Be U +++ U Bi)! = Bi A BEN +++ A Biss, 

which is precisely (4.5) when n = k + 1. 

We have now shown (4.5) is true when n = 1 and n = 2 and, 
furthermore, that its truth when n = k + 1 follows from its truth 
when n = k for any positive integer k. We conclude by the principle 


of mathematical induction that (4.5) is true for all positive integers 
n. This completes the proof. 


Note that (4.5) and (4.6) are generalizations of De Morgan’s laws, 
whereas (4.7) and (4.8) are generalized distributive laws. 


PROBLEMS 


4.1. Only laws 7a and 9b of Theorem 4.1 are proved in the text. Construct 
membership tables for the other laws, and thus complete the proof of 
Theorem 4.1. Also verify each law by means of an appropriate Venn 
diagram whose regions are numbered to correspond to the rows of the 
membership table for that law. 


4.2. How many rows are required in a membership table for a law involving 


four arbitrary sets? five sets? n sets, where n is any positive integer? 


4.3. Construct membership tables and thus show that the following laws 


hold fer any subsets A, B, C of a universal set U. 


Sec. 4 / THE ALGEBRA CF SETS 37 


44. 


4.5. 
4.6. 


4.7, 


4.8. 


(a) (4O BY =AUB 

(b) [A^ A (AU B] = AU P' 

() (ANB) O(AN B) 6 

(d AN (AUB) =AU(ANMB) =A 

() (AN (BNO =AUBU [04 

Prove each of the laws in the preceding problem by using only the laws 
in Theorem 4.1. Indicate the law in Theorem 4.1 which justifies each 
step in your proof. 

Verify each of the lawsin Problem 4.3 by the method of Venn diagrams. 


(a) It is clear from a Venn diagram that if A and B are disjoint sets, 
then A A C and B A C are also disjoint. But prove this result by 
A (^A B = 9, then (4 (^ C) (^ (BA C) = f. Jus- 


showing that if 
to either the hypothesis 


tify each step in your proof by appealing 

or to one of the laws in Theorem 4.1. 
(b) Show by examples that A N B (1 C = fi does not imply 4/1 B — 8 

or A(AC 2 ó or BOAC — 9. 
(a) Consider the following valid argument. 

Hypotheses. (1) All college students are beer drinkers. 
(2) All beer drinkers are well dressed. 
Conclusion. Therefore all college students are well dressed. 
Write the hypotheses and the conclusion using symbols of set 
theory (see Problem 3.13), and then prove the argument is valid; 
i.e., the conclusion is true whenever the hypotheses are both true. 
Each step in your proof should be justified by appealing to one of 
the hypotheses or to one of the laws in Theorem 4.1. 
Following the procedure outlined in part (a), prove that the follow- 
ing argument is valid. 
Hypotheses. (1) All college students are beer drinkers. 

(2) No beer drinkers are well-dressed. 
Conclusion. Therefore no college students are well-dressed. 
If A and B are subsets of a universal set U, the symmetric difference 
of A and B, denoted by 4 A B, is defined as the set 
AAB = (4N B) V (AO B). 

AA B. 
identify the region representing 


(b 


A 


(a) Construct a membership table for 
(b) In an appropriate Venn diagram, 
the set AA B. 
(c) By means of membership tables, prove each of the following laws: 
(i) AAG=A 
(i) AAU = A’ 
Gii) AAA =9 
(iv) AAA’ =U 


38 


4.9. 


4.10. 
441. 


4.12. 


413. 
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(v) AAB=BAA 
(vi) AA (BAC) = (AA B)AC 
(vii) AM (BAC) = (AN B)A(ANC) 
(d) Prove each of the laws in part (c) by showing that they follow 
from the laws in Theorem 4.1. 
(e) Use Venn diagrams to verify each of the laws in part (c). 
(f) Show that 


AA (BNO) = (AA B) ^ (AAC) 


is not a law, ie. there are sets A, B, and C for which the 
equality does not hold. By means of a membership table, or other- 
wise, determine what additional information about the sets A, B, 
and C suffices to guarantee that the equality does hold. 


Prove that (A A B) A (C A D) and (A A (B A C)) O D are equal 
sets, supporting each step in your proof by citing a law in Theorem 4.1. 


Prove Formulas (4.6)-(4.8) by mathematical induction. 


You are given at least one subset of a specified universal set U and are 
instructed to form all possible sets from the given sets by using the 
operations denoted by N, VU, and ". Any new sets you obtain this 
way are also to be used (with other new sets or with the given sets) to 
form still other sets by using these same operations. Identify all the 
different sets you end up with by continuing this process indefinitely 
if you are originally given the following set(s): (a) U, (b) Ø, (c) A (= a 
subset of U), (d) A and A’. 


Let @ be a set of subsets of some fixed universal set U, i.e., the elements 
of @ are subsets of U. The set G is called an algebra of sets if it satisfies 
the following conditions: 

(1) G is not empty. 

(2) If A eQ, then A’ eG. 

(3) If A eQ and B eG, then AU B eG, 


Prove the following theorems, assuming G is an algebra of sets. 

(a) aL eG. 

(b) 6 eG. 

(c) If A eQ and P eG, then AN B eG. 

Show that each of the following is an algebra of sets. (Cf. Problems 4.11 
and 4.12.) 

(a) @ = (a1, 9. 

(b) @ = (4L, Ø, A, A}, where A is a subset of aL. 

(c) @ = 24, the set of all subsets of 4L. 
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5. Cartesian product sets 


A pair of objects in which we distinguish one of the objects as the 
first and the other (which need not be different) as the second is 
called an ordered pair. If the first object is called a and the second b, 
then the ordered pair is written (a, b). This ordered pair is quite 
different from the set {a,b} containing the two objects a and b. 
There is no first element in {a, b}, since order is immaterial when 
listing clements of a set. Although (a,b) = {b, a}, we want to dis- 
tinguish between (a, b) and (b, a). We shall define two ordered pairs 
to be equal if and only if their first objects are the same and their 
Second objects are the same, i.e., 

(5.1) (a,b) = (c,d) ifand only if a —c and b= d. 

We have already used ordered pairs, and they are needed even 
more in our later work. In Example 1.5 and Problems 1.5 and 2.7, 
we considered ordered pairs of real numbers. Relative to some rec- 
tangular coordinate system, we interpreted (z, y) as representing a 
Point in the plane determined by the coordinate axes. According to 
(5.1), two such ordered pairs of real numbers are equal if and only if 
they represent the same point. 


The objects in an ordered pair nee 
when we toss a coin twice, we can represent the outcome as one of 


the ordered pairs (H, H), (H, T), (T, ED, (T, T), where we write H 
for “heads” and T for “tails.” We agree that the first object in the 
Ordered pair denotes the result of the first toss and the second object 
denotes the result of the second toss. Since we want to distinguish 
between the outcomes (H, T) and (T, H), the use of ordered pairs is 
essential. 

It is of some interest that the concep 
defined in terms of sets and that (5.1) can 


terize an ordered pair, it is sufficient to st j 
up the pair and also which is to be considered as the first object. Thus 


the ordered pair (a, b) is determined if we know the set. (a, b) of ob- 

jects in the ordered pair and the set (a) identifying the first object. 
€ are thereby led to the following definition. 

. Definition 5.1. Let a and b be any objects. The ord 

18 defined by 

(62) (a,b) = {{a, b}, {a}. 


d not be numbers. For example, 


t of an ordered pair can be 
then be proved. To charac- 
ate what two objects make 


ered pair (a, b) 
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Theorem 5.1. Two ordered pairs (a, b) and (c, d) are equal if and 
only if a = c and b = d. 


Proof. From the definition, we have 


(c,d) = {{c, d), {c}} 

and this set is clearly identical to the set defining (a, b) if a = c and 
b = d. This observation constitutes the proof of the “if” part of the 
theorem. To prove the “only if” part, we assume (a, b) = (c, d), i-es 
(5.3) {{a, b), {a}} = (ed), {ch}, 

and proceed to prove that a = c and b = d. Now two sets are equal 
only when they have the same elements. Therefore it follows from 
(5.3) that either (1) (a,b) = {c,d} and {a} = {c}, or (2) (a,b) = (2 
and {a} = {c,d}. 

If case (1) holds, then from {a} = (c) we conclude a = ¢, and 
(a, b) = (c, d) then implies b = d. Thus the theorem is true in case 
(1). 

If case (2) holds, we start from (a,b) = (c) and, recalling that 
{a, a} = {a}, conclude that a = b = c. Then {a} = (c, d) becomes 
{a} = (a, d}, from which d = a follows. Hence in case (2) we have 


a — b — c = d, and the theorem is certainly true. This completes 
the proof. 


Whenever we have two sets, we can always form ordered pairs bY 
taking the first object of the pair from one of the sets and the secon! 
object from the second set. This simple observation turns out to be 


quite important and, as usual, involves a special notation and termi- 
nology. 


Definition 5.2. If A and Bare sets, then the set of all ordered pairs 
(a, b) such that a belongs to A and b belongs to B is called the C'ar- 
tesian product of A and B, and is denoted by A X B. In symbols, 

4 X B= {(a,b)|aeA and be B}. 


We now have still another way of obtaining new sets from given 
Sets: form Cartesian product sets. 


Example 5.1. If A = {H, T} and B = (1, 2, 3), then 


4X B = {(H, 1), (H, 2), (H, 3), (T, 1), (T, 2), CL, 8j 

BX A = {(1, H), (1, T), 2, H), (2, T), (3, H), ur 

AXA = ((H, H), (H, T), (T, ED; (T^ TH, 

B X B= {(1,1), (1, 2, (1, 3), (2, 1), (2, 2), (2,3), (3, 1), (3, 2), 8:3: 
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This example enables us to indicate the reason for the importance of 
Cartesian product sets in probability. Consider the following experi- 
menta: (1) toss a coin, and (2) choose a number from among the first 
three positive integers. Each element of A represents an outcome of 
the coin-tossing experiment, each element of B an outcome of the 
Second experiment. Now think of the composite experiment in which 
we first toss a coin and then choose a number. Outcomes can be rep- 
resented by ordered pairs, like (H, 2), indicating the result of each 
part of the composite experiment. Thus the outcomes of the compos- 
ite experiment are given by the Cartesian product set A X B. 

„Note that B X A is not the same set as A X B. The set B X A 
yields outcomes of the different composite experiment in which we 
first choose a number and then toss a coin. If we toss the coin twice, 
We obtain an outcome represented by an element of A X A. Finally 
if we choose a number and then choose another number, each from 
the set B, then outcomes of this composite experiment correspond to 
elements of B x B. We hasten to add that not all composite experi- 
ments lead to Cartesian product sets. For example, if we choose a 
number from the set B and then choose another number from the 
remaining numbers, the set of possible outcomes is 


(0,2, (1, 3), (2, 1), 2, 3), (3, 1), (3, 2), 
s elements are or- 


which is not a Cartesian product set, although it 
fully in the next 


dered pairs, These matters are taken up more 
chapter. 


mentioned our interpretation of 
ts in a plane. If R is the set of 
y one point corresponding 


Example 5.2. We have previously 
ordered pairs of real numbers as poin 


all real numbers, then there is one and onl din 
to each ordered pair in R X R, and one and only one ordered pair in 


RXR corresponding to each point. Indeed, this one-to-one cor- 
respondence between points and ordered pairs of real numbers is the 


fundamental idea of plane analytic geometry. A plane with axes is 
called a Cartesian plane, after René Descartes (1596-1650), one of 


the inventors of analytic geometry. The graph of any subset of 
R X R is defined as the set of points corresponding to ordered pairs 


of the subset. For example, the set 
(ap) 1 6 7 9 
is a subset of R x R whose graph is the circle with center at the 
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origin and radius 3 units. The graph of R X R itself is the entire 
plane. 


We can define ordered triples, and in general, ordered r-tuples, in 
terms of ordered pairs. An ordered triple (or 3-tuple), for example, is 
an ordered pair whose first member is an ordered pair, i.e., 


(5.4) (a, b, c) = ((a, b), c). 


Similarly, we define ordered quadruples (4-tuples) as an ordered pair 
whose first member is an ordered triple, i.e., 


(a, b, c, d) — ((a, b, c), d). 
In general, an ordered r-tuple is defined as follows: 
(5.5) (ai, a2, ++, ar) = ((a, a2, +++, a3), a). 

From these definitions it can be proved that two ordered r-tuples 
are equal if and only if their corresponding objects are equal, i.e., 
(a, Q5, +s", a) x (bi, be, ik by) 

if and only if 
a=b, a=b, +++, a, = bpn 
We leave the proof for the problems. 
Since the Cartesian product of two sets A and B is a set of ordered 


pairs (2-tuples) it comes as no surprise that we can define the Car- 
tesian product of r sets as a set of r-tuples. 


Definition 5.3. We suppose that r is a positive integer greater than 
land that Ai, A», ---, A, are sets. The set of all ordered r-tuples 
(a, az, «++, à;) such that a; belongs to Au, az belongs to A», «++, a, be- 
longs to A, is called the Cartesian product of the sets Ay, Ay, +t, An 
and is denoted by Ai X A: X «++ X A,. In symbols, 


Ai X As X +++ X A, = (ns, a, “++, a) |aj;eA;forj = 1,2, 7- 


Example 5.3. Suppose A = {H, T}. Then the Cartesian product 
set A X A X A is the set of ordered 3-tuples in which each of the 
three objects is either an H or a T. One such 3-tuple is (H, T, H); 
there are eight altogether. As in the discussion following Example 
5.1, each such 3-tuple can be thought of as representing an outcome 
of the composite experiment in which a coin is tossed three times. 


If the number of elements in each of the sets Ay, As, +++, Ar ÍS 
given, then we ought to be able to determine the number of ordered 
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r-tuples in the Cartesian pr 

an product of these sets. We denote by n(A 
the number of elements in the set 4. — 

Theorem 5.9. If r is an itive i 

29; sitiv 
ub y positive integer and 4i, A, +++, A, any 
(5.6)  n(A X Ae X X A9 = ADNA) +++ (45. 

Proof. There are as many elementsin Ai X A2 X --- X A-as there 
are r-tuples (a1, az ++" a,) where a; € A1, a2 € A», ***, are År. The ob- 
es & can be chosen in n(A) different ways. We can then choose the 
object a» in n(A») different ways, and we continue in this way until 
we come to the last object a, which can be chosen in n(A,) different 
ways. Hence, by the fundamental principle of counting (p. 9), the 
number of r-tuples is given by the product n(A1)n(A2) >- n(A,) and 
the theorem 1s proved. 


Example 5.4. Let 
A= {H,T} and B= {1,2,3,4 5, 6. 
Then the Cartesian product set A X A X B has 
n(A)n(A)n(B) = 2.2.6 = 24 elements. 
can be interpreted as representing 


one of the 24 possible outcomes of the experiment in which we throw 
i coin twice and then roll a die. For example, (H, H, 6) would denote 
hat both tosses resulted in héads and the die showed the number 6 


on its uppermost face. 


Each element is a 3-tuple which 


PROBLEMS 


5.1. Let A =: (1,2), B = {2,3}, and C = (3) be subsets of the universal 
set 4L — (1,2,3). List the clements of the following sets: 


(2 AXA b) CXC 

(ec) AXB (d) BXA 

(e) (A x B) ^ (B x 0) (D (AX B) G XO 
(h AX BXC 


(e) (A XU) (^ (U X B) 
5.2. Sketch the graph of the se 
would you sketch the grap 
5.3. (a) Show that A X B =B X Aif 
(b) Show that A X B = f if and on 
(c) Show that AX BC C X Dif A 


true? 


ts in (a)-(d) of the preceding problem. How 
h of the set in part (h)? 
A = B. Is the converse true? 


ly if A = Bor B= 9. 
C C and B € D. Is the converse 


44 


SETS / Chap. 1 


5.4. (a) Prove that A X (BAC) = (A X B) ^ (A x C). 


(b) Prove that A X (BUC) = (A x B) U (4 X C). 


5.5. Let A and B be subsets of some universal set WU. Prove that 


5.6. 


5.7. 


5.8. 


(4 XU) AN (UX B)=AXB. 


; 595. 
The ordered pair (a, b) is defined as a certain sct by yov Ta 
How many different elements does the set (a, b) contain? (Do not fai 
to consider the case when a = b.) 
(a) With ordered 3-tuples defined by Formula (5.4), show that 
(a, b, c) = (d, e, f) 

if and only if a — d, b — e, and c — f. abe 
(b) More generally, if r is any positive integer greater than 1, poi Re 

mathematical induction that two ordered r-tuples are equal if a 

only if their corresponding objects are equal. 


Let A be a set with n elements. How many elements are in each of the 

following sets? 

(a) AXA 

(b) {(z, y) |zeA, y e A, and z e y) 

() AXAXA 

(d) ((z, y, 2) IzeA,yeA,zeA zs y tgz, yz} 

(e) For each set in ()-(d), describe an experiment whose outcomes can 
be represented by elements of the set. 


SUPPLEMENTARY READING 


mn 


- Breuer, J., Introduction to the Th 
Prentice-Hall, Inc., 1958. 


2. Kemeny, J. G., J. L, Snell, an 
Finite Mathematics, Prentice- 


eory of Sets, translated by H. F. Fehr, 


d G. L. Thompson, Introduction tê 
Hall, Inc., 1957. 
3. Mathematical Association of America, 


Committee on the Under- 
graduate Program, Elementary Mathemat 


dcs of Sets, 1958. 
- May, K. O., Elements of Modern Mathematics, Addison-Wesley Pub- 
lishing Company, Inc., 1959. 


- Suppes, P., Introduction to Logic, D. Van Nostrand Company, Ine; 
1957. 


ex 


Chapter 9 


PROBABILITY IN FINITE 
SAMPLE SPACES 


1. Sample spaces 


Probability questions arise when we think of real or conceptual 
experiments and their outcomes. Therefore, our first task in the pre- 
cise formulation of probability theory must be to discover a suitable 
mathematical way by which an experiment can be specified. 

Think of tossing a coin. We ordinarily agree to regard “head” and 
"tail" as the only possible outcomes. If we denote these outcomes by 
H and T respectively, then each outcome of the experiment would 
correspond to exactly one of the elements of the set (H, T}. This 
Sel is called a sample space for the experiment. 

Now let us toss a penny and a nickel. How shall we record the 
outcome of this experiment? Each time we toss the coins, we can 
write down the number of heads obtained. Accordingly, each out- 
come of the coin-tossing experiment corresponds to exactly one of 
the elements of the set S: = {0, 1, 2}. S: is a sample space for the 
experiment. We say a rather than the sample space, since we can 
think of other ways of describing the outcomes of this same experi- 
ment. Indeed, were we to toss the coins and record, let us say, only 
that we obtained one head, we are then embarrassed by the question, 
“Did the penny fall heads?” Our method of classifying outcomes was 
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too coarse; we lost information by merely recording the number of 
heads obtained. 

We get a finer classification by recording whether both coins fall 
heads (HH), the penny falls heads and the nickel tails (HT), the 
penny falls tails and the nickel heads (TH), or both coins fall tails 


(TT). Now each outcome of the experiment corresponds to exactly 
one element in the set 


S: = (HH, HT, TH, TT}. 


S» is another sample space for this experiment. We recognize Sz as 
the Cartesian product A X A, where A = {H, T} and where we have 
introduced simplified notation for ordered pairs, writing HH for 
(H, H), HT for (H, T), etc. When, as in this example, there is no 
possibility of misinterpretation, we shall often use this less cumber- 
some notation for ordered r-tuples. . 
This situation is typical of most examples. Whether to classify 
outcomes one way or another is not a question our theory answers. 
Let us therefore agree at the outset that there is no one correct 
sample space for a given experiment. Different people or even the 
same person at different times may describe the outcomes differently. 


We insist only that any sample space meet the requirements in the 
following definition. 


Definition 1.1. A sample space S associated with a real or con- 
ceptual experiment is a set such that (1) each element of S denotes 
an outcome of the experiment, and (2) any performance of the ex- 


periment results in an outcome that corresponds to one and only one 
element of S. 


Although many sample spaces may meet these requirements, and 
hence serve to describe the same experiment, we have seen that one 
may be more suitable than another. In general, it is a safe guide to 
include as much detail as possible in the description of the outcomes 
of the experiment. Imagine that you are recording the outcome in & 
notebook and insist that what you write enables you to answer all 
pertinent questions concerning the result of the experiment. 


Example 1.1. Let a green die and a red die be rolled. The set 
Sı = (0 sixes, exactly 1 six, 2 sixes} 


is a set that meets the requirements of Definition 1.1, and hence can 
serve as a sample space for this experiment. So can the set 
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S2= {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, 
if we understand that an element of S» stands for the sum of the num- 
bers on the dice. But neither S; nor S» involves a fine enough classi- 
fication of outcomes to answer the question, “Is the number on the 
red die greater than the number on the green die?" To take care of 
all relevant questions, we should record the numbers on each of the 
dice. We are thus led to take as sample space the set S (defined on 
P. 12) containing 36 ordered pairs, it being understood that (x, y) 
denotes the outcome in which the green die shows the number x and 
the red die the number y. Since z and y are themselves integers in 
the set 

D = (1, 2, 3, 4 5, 6}, 

the sample space S can be written as a Cartesian product: 
(1.1) S= DX D = {(z, y) |z €D and y e D}. 


Note that D itself can serve as a sample space for the exp 
Which one die is rolled. Finally, let us observe that 


S, — (0 sixes, 2 sixes) 


eriment in 


and 
S, = (0 sixes, (1, 6), exactly 1 six, (6, 6)} 

t serve as sample spaces for the two- 
ate condition (2) in Definition 1.1: 
element of S; and 


are examples of sets that canno 
dice experiment. Both sets viol 
the outcome (1, 6), for example, corresponds to no 


to two elements of Ss. 


n be an infinite set. For example, toss a coin 
first time. It is logically conceivable that 
tails and that a head is never Qb- 
head is obtained, we specify the 
f the toss that produced the first 


A sample space ca 
until it falls heads for the 
we get an unending sequence of 
tained. Call this outcome w. If a 
outcome by recording the number o 
head. Our sample space is 


S= 15123,» 


Which is clearly an infinite set. As another example, let an experiment 
consist of selecting one point from among the points on some line of 
unit length. (This conceptual experiment can be carried out, at least 
with our mind’s eye, by imagining an exceptionally pointed dart 
thrown at a line segment.) Since we can associate a unique real num- 
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ber with each point on the line, we can take as sample space the 
infinite set 
S={teR|O<x<h, 

where R denotes the set of all real numbers. . . 

As our discussion indicates, one way of formulating in a precise 
way the notion of an experiment is to write down an associated 
sample space. Since we shall always speak of the probability of an 
event in connection with some real or conceptual experiment, our 
mathematical theory begins with the specification of a sample space by 
which we define the experiment. Although the general theory of proba- 
bility deals with both finite and infinite sample spaces, in this book 
we restrict our attention to finite sample spaces only. 


Example 1.9. From a large group of people, r are selected and their 
birthdays (but not birth years) are recorded. We want to specify à 
sample space associated with this experiment. Let us number the 
days of the year 1, 2, 3, ---, 365 and omit people born in leap years 
on February 29. A typical outcome of the experiment might be the 
ordered r-tuple (17, 3, 131, ---, 78), the first number of which is the 


birthday of the first person selected, the second number the birthday 
of the second person selected, ete. E 


ach birthday is an element of 
the set 
A f 2,3,..., 365), 
but an outcome of the experiment is recorded only when we write 
down an r-tuple of numbers, each selected from A. (Cf. Problem 
1.2.20.)* Hence we define the Sample space given by the Cartesian 
product set 
03 S=AXAX XA = (m --,2)|zneA 
for i = 1,2, +++, r}- 
ements. If r > 4, then S con- 
evertheless, S is a finite sample 
probability questions about this 


* To refer in any chapter to a theorem, definition, 
in that same chapter, w 


The sample space S contains 365" el 
tains more than a billion r-tuples. Ni 
space for all values of r. "Therefore, 


example, problem, or formula 


€ use only the number by which it is identified in the text. 
But to refer to one of these items a 


d ppearing in another chapter, we prefix its identi- 
fying number with a roman numeral identifying the chapter. For example, we 
write Problem 1.2.20 to denote the twentieth problem in Section 2 of Chapter 1- 
Were we to write Problem 2.20 here, we would mean to refer to the twentieth 
problem in Section 2 of the present chapter, namely Chapter 2. 
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experiment will be answered by our theor: Y i i 

Nc ACE, y ory. (We continue this prob- 


" Setup E: x What sample space should we use for the experiment 
dps e Ad ind is dealt from an ordinary deck of cards? Since 
Fen p y about which 13 cards make up the hand,and not about the 
ns E 4 hich they are dealt, we can consider a bridge hand as a sub- 
: cards from the set of 52 cards in the deck. Let us write 
dum: +++, 2, to denote the ace, king, ++, deuce of spades, reserving 
rep h, d, and c to indicate cards that are hearts, diamonds, and 
, respectively. Then 
(1.3) D= (4o +++, 20 An, ** 7, 2h Aa, ** 24 As +++, 2c} 


= set of 52 elements representing the full deck. For our experiment, 
take as sample space the set S of all 13-element subsets of D. In 
Symbols, 

(14) S = {B | BC Danda(B) = 13}. 

bridge hands will be answered by 


The problem of determining n(5), 
ble bridge hands, is taken up in 


Pr n a 1 

xS Y questions concerning 

: 2 theory, since S is a finite set. 

oe of counting the number of possi 
hapter 3, 


PROBLEMS 

1. P 5 

1. We describe certain experiments. In each case specify an appropriate 
sample space for this experiment. 
(a) A card is selected from a standard deck of cards. 
(b) "Three coins are tossed. 
(c) A boy has a penny, a nickel, a dime, and a quarter in his pocket. 

He takes two coins out of his pocket, one after the other. 

(d) Two distinguishable objects are distributed in two numbered cells. 


(e) Two indistinguishable objects are distributed in two numbered cells. 
ro children is made and the sexes of the 


O A survey of families with tw 
children (the older child first) are recorded. 
(e) A survey of families with three children is made and the sexes of the 
children (in order of age, oldest child first) are recorded. 
(h) A survey of families with 7 children is made and the sexes of the 


children (in order of age, oldest child first) are recorded. 


na r coins are tossed. i 
i) A poker hand (five cards) is dealt from an ordinary deck of cards. 
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1.3. 


1.4. 


1.5. 


1.6. 


1T. 
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Six boys in a club select 2 committee of three. The boys are A, B, C, 
D, E, and F. 


(a) List the 20 elements of the appropriate sample space S for this 
experiment. 

(b) Find the subset of S containing those outcomes in which A is se- 
lected. How many elements does this subset contain? 

(c) Find the subset of S containing those elements in which both A anc 
B are selected. How many elements does this subset contain? . 

(d) Find the subset of S containing those outcomes in which A or B is 
selected. How many elements in this subset? 

(e) Find the subset of S containing those outcomes in which A is not 
selected. How many elements in this subset? 


Refer to part (c) of Problem 1.1. For how many outcomes of the sample 
space is it the case that the boy takes less than 15 cents out of his pocket? 


Refer to the sample space S of 36 elements in Example 1.1 of the text. 
Let E denote the subset of S whose elements denote outcomes for which 
the sum of the numbers on the dice is greater than 9, and F the subset 
whose elements denote outcomes for which the numbers on the dice are 
equal. Determine the elements in the following sets: 


() EOF @ EF 
(b BUF () E' 
(c) P'AP (D F' 


An experiment consists of selecting one chip from a hat containing six 
chipe numbered 1, 9, 9, 4, 5, and 6. OF the following ents, state which 
Are suitable sample apaces for this experiment and which are unsuitable: 
(a) S = {1, 2, 3, 4, 5, 6} 

(b) S = {1, 2, 3, 4, 5} 

(c) S = {odd number, even number} 

(d) S = (1, 3, 5, even number} 

(e) S = (1, 2, number less than 6, 6} 

(f) S = {number less than 3, 3, number greater than 3} 


An experiment consists of selecting r light bulbs from the lot produced 
by a machine and testing them. A bulb can be good (G) or bad (B)- 
Define a sample space for this experiment and compare your sample 


Space with those in Problems 1.1 (h) and 1.1 (i). What observation do 
you make and what lesson is learned thereby? 


Urn 1 contains one black and two white balls. Urn 2 contains twa black 


and one white ball. An experiment consists of first selecting an arn and 


then drawing a ball from this urn. Define a suitable sample space for 
this experiment. 
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2. Events 


The theory of probability begins when a sample space S, the 
mathematical counterpart of an experiment, is specified. The sample 
Space serves as the universal set for all questions concerned with the 
experiment. 

We may be interested in the occurrence of a variety of events when 
an experiment is under consideration. For example, think of the ex- 
periment of tossing a coin three successive times and let 


(21) s = (HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} 


be the associated sample space. We may be interested in the event 
“the number of heads exceeds the number of tails." For any outcome 
of the experiment we can determine whether this event does or does 
not occur, We find that HHH, HHT, HTH, and THH are the only 
elements of S corresponding to outcomes for which this event does 
occur; if the experimental outcome corresponds to one of the other 
elements of S, then the event in question does not occur. Thus, to 
say that the event “the number of heads exceeds the number of 
tails” occurs is the same as saying the experiment results in an out- 
come corresponding to an element of the set 
A = O(HHTI, HAT, HTH, THH}. 

Wo recognize A as a aubeet of the sample spare 8 The subst A ean 
be taken as the mathematical counterpart of the evont "the pd 
of heads excceds the number of tails.” Similarly, we find a follow: 
ing correspondence between various events and subsels of S: 


Verbal Description of Event Corresponding Subset of S 
Number of heads exceeds 
number ot tai m — (mum, HUT, HTH, THH} 
Number of heads is exactly 2 = (HHT, an e THOSE <ul 
Number of heads is at least 2 = (HHH, HHT, eum uiis 
Second toss is heads = {HHH, E. " 
All tosses show the same face = {HHH, bod — dum 
Number of heads is less than 2 œ = (HTT, THT, es gu 
Second toss is not heads p’ = (HTH, HTT, d 
Second toss is heads and the 
number of heads is exactly 2 


uoo0out 


DnB = (HHT, THH} 
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Second toss is heads or the 
number of heads is exactly 2 DUB = |HHH, HHT, HTH, THH, THT} 


In the light of this example, we introduce the following general 
terminology. 


Definition 9.1. Let a sample space S be given. An event is a ae. 
of S. We say the event E occurs if the outcome of the experimen 
corresponds to an element of the subset E. 


Because of this definition, the language and notation of set, theory 
can be expected to find extensive use in the theory of probability. To 
illustrate this point, suppose an experiment specified by the sample 
space S results in an outcome denoted by the clement o eS. The 
reader must be certain that he understands the correspondence be- 
tween the everyday language on the left and the set language and 
symbolism on the right in the following glossary: 


Event E Subset E of the sample space S 
Event F Subset F of the sample space S 
Event E occurs oE 


Complementary event of E (not-E) 
Event E does not occur 
Event E or event F 


E' (the complement of set E) 
ocE' 


EUF (the union of sets E and F) 
Either event E or event F occurs 


(at least one of E and F Occurs) 


oe (EUF) 

Event E and event I? EOF (the intersection of sets E and P) 
Both event E and event F occur oc(En PF) 

Event E is impossible E=6 

Event E is certain E-S 


E and P are mutually exclusive events E NF=6 
If event E occurs, then event F 
occurs (E implies I?) E CF (E is a subset of F) 
Because of its intuitive appeal, we shall continue to use the every- 
day language listed in the left column. But let us recognize that cach 
such phrase is defined by the corresponding set-theoretic equivalent 


in the right column and is thereby given precise meaning within our 
mathematical theory. 


Example 2.1. We return to the experiment in which r people are 
selected and their birthdays recorded. In Example 1.2, we defined 2 
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sample space S for this experiment and found that it had 365” ele- 
ments. Now let E be the event that at least two among the r peopls 
selected have the same birthday. We want to determine n(E), the 
number of elements in the subset E. It turns out to be easier to cal- 
culate n(E^), the number of elements in the complementary event oi 
E. We then use the formula (Cf. Problem 1.3.7) 


n(E) + n(7) = n(S) = 3657 


to determine n(Z). 

Now Ei’ is the event that no two among the r people selected have 
the same birthday. Hence, n(E) is equal to the number of ways of 
selecting r different numbers (birthdays), each being chosen from the 
full set of 365 possible birthdays. The first man's birthday can be 
chosen in 365 ways, the second man's in 364 ways, the third man's in 
363 ways, and so on until we select the rth man. His birthday, in 
order for it to be different from all the others, can be chosen in only 
365 — (r — 1) or 365 —r +1 ways. Invoking the fundamental 


principle of counting, we conclude that 
n(B’) = 365 - 304.303: (865 — r +1). 


Finally, we find 
(22)  a(E)- 365" — 365 304-363" (365 — r + 1) 
ays that the T selected people include 


for the number of different W. 
Eni hday. (Continued in Example 3.6.) 


at least two having the same birt 


PROBLEMS 


2.1. Refer to the sample space of Problem 1.1, part (a), and determine the 


subsets defining the following events. 


(a) The card selected is a spade. , 
(b) The card selected is a jack, queen, or king. 
(c) The card selected is the ace of spades. 

2.2. A green and a red die are thrown. (Cf. Example 1.1) MT 
event that the sum of the numbers on the faces is even, an hi 
that the number on the green dic is odd. 

subsets A and B. 


] description of th 
ple spac 


(a) List the elements of 
(b) Give a concise verba 
(c) How many elements of the sam 


e event A N B. 
e S are in the event AUB 
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2.3. Refer to the sample space S in Example 1.2. Let E be the event that 
the first person selected is born on January 3 and F that the second 
person selected is born on January 28. (a) Write E, F, and E A F as 
Cartesian product sets. (b) Count the number of elements in E, P, 
EC\F,and EV F. 

2.4. Let A, B, and C be any events of a sample space S. Using only the 
symbols A, UJ, ’, A, B, C, write expressions for the events that of A, 
B, and C: 


(a) At least one occurs. (b) Only A occurs. 
(c) A and B occur, but not C. (d) All three occur. 
(e) None occurs. (f) Exactly one occurs. 
(g) Exactly two occur. (h) At most two occur. 


(Hint: Refer to Figure 7 on p. 21 and determine the region representing 
each event.) 

2.5. Refer to the sample space of Problem 1.1(e) and let E be the event 
that the first cell is empty, F the event that the second cell is empty, 
and G the event that the second cell contains both objects. Show that 
the following relations among these events are true. 

(à EOF -f () FC@ 
(b ECG (d) S= F'U P 
2.6. Let S be the sample space defined in Problem 1.7. Suppose E is the 


event that the first urn is selected, and F that a white ball is drawn. 


Describe the following events in words, and list their elements: E O F, 
BE AF, Fr, EXJ F. 


3. The probability of an event 


If a (real or conceptual) experiment is under consideration, there 
are many events in whose occurrence we may have some interest. 
Using our mathematical language, this amounts to saying that if a 
sample space S is given, then we can form many subsets of S. In 
fact, if 
(3.1) S = {01, 02, +> *, On} 


is a finite sample space containing the n elements 0}, o», «++, On, then 
there are 2" different subsets of S and since each subset is an event, 
there are 2" different events. In this section, we are finally in a po- 
sition to define what is meant by “the probability of an event." 
Our first step is to distinguish certain special events that form 
building blocks from which other events can be constructed. 
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Modes 3.1. Let a sample space S be given. We shall mean by a 
nple event a uni i ini 
ribs a unit subset of S, i.e., a subset containing only one ele- 
With S defined in (3.1), there are exactly n simple events, viz 
^3 
on {or}, {02}, +++, {on}. 

-he event {o1, o2} is not a simple event, but it is th i 7 
Fak, wi n " is the union of two 
Sii {o 0} = {0} U to3- 

Similarly, the event 
(03, 04, Oo} = {os} U {os} U {os}. 
S we see that all nonempty events are either simple events or unions 
e: wo or more different simple events. In addition to the nonempty 
esti there is also the null event f. The union of all simple events 
umerated in (3.2) is the entire sample space, i.e., 


S= (o) U (o3 U ++: U {Con}. 


Probabilities are assigned to the simple events first. 
4 Definition 3.9. Let the sample space S be defined as in (3.1). To 
eoh simple event {o} we assign a number denoted by P({0;}) and 
ca led the probability of the event {oj}. These numbers (probabilities) 
: n be assigned arbitrarily, except that they must satisfy the follow- 
ng two conditions: 
(i) The probability of e 
, 
4 3 : 
(3.3) P(fo}) 20 G=L% 
d to all simple events of 


ach simple event is a nonnegative number, 


le. 


a The sum of the probabilities assigne 
€ sample space is 1. In symbols, 


G4) $ Pop) = Po PUD t PEO = 1 


We shall say that an assignment of probabilities to the simple events 


od. b 
S is acceptable if it satisfies (i) and (ii). 
t the probability of each 


it is clear tha 
d by condition (i), but 


In vi ed " 
tn view of condition (ii), 
0, as require 


si : 
B ple event is not only at least 
So at most 1; i.e., 


(3.5) 0 < P(fo)) € 1 G =1,2, isay) 
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It is most important to recognize that in spite of these restricting 
conditions, there are many possible assignments of probabilities to 
the simple events. We give some examples. 


Example 3.1. Let a coin be tossed. We define the sample space 
S = (H, T} and so have two simple events: (H) and (T). Each of 
the following assignments of probabilities to these simple events i$ 
acceptable: 


(1) P((H)) = P({T}) = 3, 
(2) P((H)) -3 and PUT} - 2, 
(3) P((H) =1 and P({T}) =0. 


In fact, if p is any real number between 0 and 1 inclusive, then 
P((H) =p and P({T})=1—p 


is an acceptable assignment of probabilities to the two simple events 
{H} and {T}. Therefore, we see that there are infinitely many pos- 
sible acceptable assignments, one for each choice of the number 7. 

Most people would find the choice P({H}) = P({T}) = 4 the 
"natural" choice. We do not go into the psychological reasons behind 
this feeling, but merely make three points: 

(1) This choice is neither more nor less acceptable than any other 
acceptable choice. An assignment of probabilities to simple events i$ 
either acceptable or not; there are no degrees of acceptability. Defi- 
nition 3.2 requires only that we meet the two conditions for assigning 
probabilities to simple events. 

(2) This "natural" choice is not dietated by experience with real 
coins. Experience with real coins shows that they usually fall short 
of being “ideal” coins: they are not perfectly circular and symmetri- 
cally weighted; heads and tails are not equally likely. 

(3) Nevertheless, we often do choose to develop the theory for 
Such “ideal” or “fair” coins since, as we shall see, the theory then is 
both logically and psychologically app 
the fact that the theory for “ideal” 
real coins and deciding, 
dictions of the theory a 
real coin, whether it i 
"fair." This last 
theory wil! 


ealing. But more important i$ 
coins supplies a basis for testing 
depending upon the extent to which the pre- 
re borne out by the empirical results with the 
S reasonable to assume that the real coin i$ 
point leads us to expect correctly that probability 
prove useful in problems of statistical inference. 
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y Example 3.2. A green and red die are rolled, and we use as sample 
Ta set B defined in Example 1.1. There are 36 simple events. 
Em natural f assignment of probabilities to these simple events is 
the one in which each simple event is assigned probability 3;. This 
rius acceptable assignment, since both required eundem ave ful- 

led: the probability of each simple event is nonnegative and the 
sum of the probabilities of all simple events is 1. Of course here too 
there are infinitely many other acceptable assignments of probabilities 


to the 36 simple events. 


" Example 3.3. A person is selected from the population of a certain 
eae and asked the question, “Do you think there will be another 
Aei Ton We classify each answer into one of the three categories 
a es,” “No,” “Don’t Know." Our sample space S contains three 
ements, one for each possible answer: 
S = (Y, N, DK}. 
s example illustrates the fact that 


There are three simple events. Thi 
the assignment of probabilities to 


Ps often do not have any basis for 
ky simple events of an experiment. In the absence of information 
out the opinions of people in the country, we can do no more than 
PES appropriate values for these probabilities. However, we can 
nake the assignment 
PUY) =p, PUN) =% 

Where we know only that p, 4 and r are 
Whose sum is 1, 
" p20, «20 r 2 0, 
T we are told that 60 percent of the po 
Sn aM world war, 30 percent do not € 

"Percent are uncertain, then it seem: 
9 — 033, and r = 0.1. 


P({DK}) =" 


nonnegative real numbers 


ptqtr= iH 
pulation of the country expect 
xpect another world war, and 
s natural to choose p = 0.6, 


lity of any event. Let 


It is now an casy step to define the probabi 
acceptable assign- 


a finite sample space 5 be given, and suppose an 
ment of probabilities has been made for the simple events of S. Let 
E be any event. Then Æ is cither (i) the empty set 9, (ii) a simple 
vent, or (iii) the union of two or more different simple events. In 
Case (ii) the probability of £ has already been assigned. The follow- 


ing definitions take care of cases (i) and (iii). 
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Definition 3.3. The probability of the empty set Ø, denoted by 
P(), is defined to be zero; i.e., P(Ø) = 0. 

Definition 3.4. If E is the union of two or more different simple 
events, then the probability of E, denoted by P(E), is the sum of the 
probabilities of those simple events whose union is E. (It is under- 
stood that each simple event is counted exactly once.) 


Example 3.4. A green and a red die are rolled, and we choose the 
sample space S containing the by now familiar 36 ordered pairs. 
Let us assign the probability 4; to each of the 36 simple events of S. 
If A is the event “sum of numbers on dice is 7,” then 
A = {(1,6)} U {2,5)} U (3, 0) U {4,3)} U (6,2) U ((6, D}- 
Hence, by Definition 3.4, we find 

P(A) = ds + dg + de + de + dud 
Similarly, if B is the event “sum of numbers on dice is 11,” then 


B = {(5, 6)} U {(6, 5)} 


and, again by Definition 3.4, 
P(B) = 7s + is =i. 


Example 3.5. A card is selected from a standard deck of 52 cards- 
We take the set D in (1.3) as sample space, and seek the probability 
that the card selected is a spade. Let us assign equal probabilities to 
each of the 52 simple events of D. Then each simple event has prob- 
ability 3. The event that the card is a spade is the union of the fol- 
lowing 13 simple events: 


(43, {K}, bb (2. 


Hence 


P(card selected is a spade) = ay + ay E... + w=} 

13 times 
Similarly, the event that the card selected is an ace or a spade is the 
union of the following 16 simple events: {As}, {An}, {Aa}, {A}, (C9; 
{Q}, +++, {2.}. Hence 
P(card selected is an ace or a spade) = gd + se doe. 


16 times 
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Example 3.6. We continue the birthday problem of Example 2.1 
and now compute the probability of the event Æ that at least two 
among the r people selected have the same birthday. We assume that 
cach ordered r-tuple of the sample space S is as likely as any other to 
represent the outcome of the experiment. (This assumption would 
of course be false if the 7 people were selected at a convention of 
twins. But even if the selection is made from the entire population 
of the United States, the fact that the proportion of all births occur- 
ring in a given month varies from month to month still makes our 
mathematical model only a first approximation to the actual state 
of affairs.) In any case, we assign to each of the 365" simple events of 
S the same probability 1/(365)’. Since the number of simple events 
= E is equal to n(Z), the number of elements in Æ (why?), it follows 

rom Definition 3.4 that P(E) is the sum of n(E) probabilities, each 


equal to 1/(365). Hence 
1 
P(E) = 1E) ggg? 
and if we substitute the expression for n(E) found in Formula (2.2), 
We obtain 
j - 365 - 364 -:- 365 — r +1), 
(3 6) P(E)-21—-— C OE NM 
o two decimal-place accuracy) for 


The probability P(E) is given (t 
ote the rather surprising fact that 


various values of r in Table 10. N 


TABLE 10 


bility of finding at least 
is greater than $. 


in as small a group as 23 people, the proba 


two people with the same birthday 


Example 3.7. Two coins are tossed. We define the sample space 


given by 
g = (HH, HT, TH, TT) 
and ask for the probability of the event E that at least one head. 


Occurs. 
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Solution 1. We assign probability 1 to each of the four simple 
events of S. The event E is the union of three simple events, 


E = {HH} U {HT} U {TH}. 
Hence P(E) =} +4 +4 =. 
Solution 2. We assign probabilities to the simple events as follows: 
P({HH}) = P({HT}) = à, P({TH}) = P({TT}) = 0. 


The event Æ is still the union of the same three simple events, but 
now 
P(f)-irir0-1 

"These two solutions yield different values for the probability of the 
Same event E. This should not be disturbing since, according to 
Definition 3.4, the probability of an event depends on the previous 
assignment of probabilities to the simple events. With different ac- 
ceptable assignments of probabilities to the simple events, as in 
Solutions 1 and 2 of Example 3.7, we have no reason to be surprised 
if an event E turns out to have different probabilities. Which as- 
signment of probabilities to the simple events should be made is not 
a mathematical question, but one that depends upon our assessment 
of the real-world situation to which the theory is to be applied. The 
assignment in Solution 1 is the natural one for unbiased, “fair” coins, 
but the assignment in Solution 2 is more sensible if one is sure that 
the first coin is loaded so as to turn up heads all the time. 


We conclude with three remarks. 

(1) Statements in the theory of probability, as in all of mathe- 
matics, are of the conditional form; i.e., “If such and such is assumed, 
then such and such follows.” Ordinarily, when we select a card from 
a full deck and inquire, “What is the probability of selecting 2 
spade?” we answer, as in Example 3.5, “The probability of selecting 
a spade is 1." But a complete answer would read as follows: "If we 
choose as sample space the set D containing 52 elements, one for 
each card in the deck, and if we assign equal probabilities of Jj to 
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each of the 52 simple events of D, then the probability that a spade 
is selected is 1." The antecedents of this conditional assertion are 
usually omitted and only the consequent is stated, when it is clear 
from the context which sample space and which assignment of prob- 
abilities to simple events have been chosen. But if two people do the 
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same problem and get different numbers for the probability of the 
same event, it becomes important to spell out the hypotheses each 
used. Each answer may logically follow from the hypothesis em- 
ployed in its derivation, and the answers may differ because different 
sample spaces or different assignments of probabilities to simple 
events were made. However, if the same sample space is used and 
the simple events are assigned the same probabilities by both people, 
then two different values for the probability of the same event can 
only mean that at least one of the people has committed a logical 
error. 

The situation here is analogous to that in plane geometry. The 
assertions “The sum of the angles of a triangle is 180 degrees" and 
“The sum of the angles of a triangle is less than 180 degrees" cer- 
tainly differ, but we cannot say whether they are true or false, since 
neither is a complete mathematical assertion. The hypotheses must 
be explicitly stated or implicitly understood. Thus, the first state. 
ment is true if it is intended to read, “Tf the axioms of Euclidean 
geometry are accepted, then the sum of the angles of a triangle is 180 
degrees.” And the second statement is true if we expand it to read, 
“If the axioms of Lobachewskian geometry are aceepted, then the 
sum of the angles of a triangle is less than 180 degrees.” Both 
Euclidean and non-Euclidean geometries are fruitful mathematical 
theories, and since their premises differ, nobody is now disturbed 
When their conclusions also differ. Which geometry should be used in 
a particular context is not à mathematical question, but one that is 
of great interest to those (like physicists) who apply geometry to the 
n rola, ts and only events have 


(2) Our definitions are so framed that even 
Probabilities. Some authors prefer to formulate the theory so that 


Statements are assigned probabilities. Except for linguistic differences, 


obabilities to the simple 
lied by the use of certain adjectives. 


"unbiased" dice are 
hat each of the 36 simple events of the 
d probability gis. When we say that 
» from n different numbers, then we 
h of the n simple events of the 


pair of "fair" or 


thrown, we mean to insist t 
familiar sample space be assigne 
a number is selected “at random 
Agree to assign probability 1/n to eae 
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appropriate sample space. Further examples will be discussed later, 
but for the present we adopt this convention in the problems that 
follow. 


PROBLEMS 


3.1. Consider the dice experiment of Example 3.4 and make the “natural” 


3.2. 


3.3. 


3.4. 


3.5 


assignment of probabilities to the 36 simple events, so that each has 
probability 5. Find the probability of the following events. 


(a) The sum of the numbers on the dice is less than 4. 
(b) One die gives a 3 and the other die a number less than 3. 
(c) The sum of the numbers on the dice is 2 or 12. 


A letter of the alphabet is chosen at random. Find the probability 
of the event that the letter selected 


(a) is a vowel. 

(b) is a consonant. 

(c) precedes u (in alphabetical order) and is a vowel. 
(d) follows t and is a vowel. 

(e) follows v and is a vowel. 


A committee of three is selected from six people A, B, C, D, E, and F. 
(Cf. Problem 1.2.) 


(a) Specify a suitable sample space S and make an acceptable assign- 
ment of probabilities to the simple events of S. 

(b) Find the probability that A is selected. 

(c) Find the probability that A and B are selected. 

(d) Find the probability that A or B is selected, 

(e) Find the probability that A is not selected. 

(f) Find the probability that neither A nor B is selected. 


Let the sample space S = {01, 02, 03, o3} be given. Probabilities are 
assigned to the simple events so that 

P((o)) = P((o3)), P((o3) = P((o3) = 2P((03). 
Find P({o;, 03}). 


In each of the following, specify an appropriate sample space S, assign 


probabilities to the simple events of S, and then find the required prob- 
ability. 


(a) Find the probability of obtaining exactly two tails if three coins 
are tossed. 
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(b) Find the probability that one cell is empty when two distinguish- 
able objects are distributed in two cells. [Cf. Problem 1.1(d).] 

(c) Find the probability that one cell is empty when two indistinguish- 
able objects are distributed in two cells. [Cf. Problem 1.1(e).] 

(d) Find the probability of finding a family with no boys among 
families with two children; with three children; with r children. 
[Cf. Problem 1.1(f)-(h).] 

(c) Find the probability of all coins falling heads when r coins are 
tossed. [Cf. Problem 14).] 

(f) A die is loaded in such a way that the probability of the face 
marked j turning up is proportional toj forj = 1, 2, +++, 6. Find 
the probability that an odd number turns up when the die is rolled. 


(g) A month of the year is randomly selected, and we note the day of 


the week on which the 13th day of the month falls. Find the prob- 


ability that this 13th day falls on a Sunday. 

rting the result of a survey of 321 union 
men. Let one man be selected at random from this group of 321 men. 
Decide on a suitable sample space and assignment of probabilities to its 
simple events, and then find the probability that the man selected 


3.6. Refer to Table 4 (p. 24) repo 


(a) answers "yes." 
(b) answers “yes” an 
(c) answers "don't know 


d is in the union four or more years. 
? and is in the union less than four years. 


(d) answers "don't know" and is in the union over 10 years. 


3.7. Refer to Problem 2.3 and find the probability of the events E O F and 
EWU F. Make the same assignment of probabilities to simple events 


of S as was made in Example 3.6. 


3.8. Find the probability that among 7 people, there will be at least one 
whose birthday is the same 3$ yours. Use logarithms, or otherwise 
determine the smallest value of r for which this probability 1s at least 3. 

3.9. Three squares numbered 1, 2, and 3 are marked on a table. A deck of 
three cards numbered 1, 2, and 3 is shuffled and then dealt so that one 
card appears in each numbered square. This experiment can be thought 
of as resulting in à permutation of the numbers 1, 2, and 3. 


(a) Enumerate the six permutations that make up the sample space. 


(b) Assign equal probabilities to each simple event. M 
(e) If card number j is dealt so as to fall in we cae ar we au 
t in square j. Let E; be the even ata ma ch 
"mi tmi a be equal to 1, 2, or 3.) Com- 


occurs in square j. (Of course, j can ! or 3) Con 
pute the probability of each of the following events: E Es Es 
E, U Es, E, (A Es B O Ba En E; U Bs VU Es. 
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(d) Is P(E, U E») equal to P(Ej) + P(E:)? Why not? Can you write 
a correct formula for P(E, U Ej)? 

3.10. Find the probability that the player wins in each of the following 
lotteries. For each lottery, first define a sample space and assign prob- 
abilities to its simple events. 

(2) Two white and four black balls are placed in an urn and thoroughly 
stirred. The player draws one ball and wins if the ball he draws 
is black. 

(b) Same as (a), except that the player tosses a coin with one face 
painted white and the other face painted black just before he draws 


the ball. He wins if the ball drawn is of the color he tossed with 
the coin. 


4. Some probability theorems 


We now derive some consequences of the definitions given in the 


preceding section. We assume throughout that a finite sample space 
S is given, 


(4.1) S = {01, 0, +++, on}, 


and that some acceptable assi 
events of S has been made. 


Because it will be helpful in visualizing the results to be proved, we 
pause to reformulate the definitions 


of the preceding section in a more 
Picturesque language. We use the basic idea of a Venn diagram to 
represent the sample space S and events (subsets) of S. But now we 
use dots to indicate elements of S. Each dot determines one simple 
event, namely, the simple event 
containing as its only member 
the element represented by the 
dot. We imagine a flag erected 
at each dot and on this flag we 
write the probability assigned to 
the simple event determined by 
this dot. The flag erected at the 
dot representing outcome o; flies 
the number P((o)), the flag 
erected at the dot representing 
outcome o» flies the number P. 
Definition 3.2, the number o 


gnment of probabilities to the simple 


Figure 11 


({o2}), ete. See Figure 11. Because of 
n each flag is nonnegative and the sum 
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of the numbers on all the flags is 1. We now show that we can para- 
phrase our definitions as follows: The probability of E is the sum of 
the numbers on the flags erected at elements of E. i 

If E = Ø, then no dots, and therefore no flags, appear in the region 
representing Æ. In order to have P(Ø) = 0, as required by Dèfnition 
3.3, we shall agree to say that the sum of the numbers on the flags is 
0 when there are in fact no flags. If E is a simple event, then ais 
one dot appears in the region representing Z and the flag erected at 
that dot carries precisely the number P(£). If E is the union of two 
or more, say 2, different simple events, then the region representing 
E contains exactly x dots and, when we add the numbers on the flags 
erected at these x dots, we are adding the probabilities of the sip 
events whose union is E. By Definition 3.4, we obtain P(E). Thus 
we have shown that the italicized phrase concluding the preceding 
paragraph is correct for all events E. We shall therefore feel free to 
use this picturesque “numbers on flags" language whenever it seems 


helpful. 


r Theorem 4.1. P(S) = 1; ie, the probability of a certain event is 


Proof. The sample space S is the union of all n simple events, 


S = {o} U {ox} Use: U {on}. 
Hence, by Definition 3.4, 


ment (in Definition 3.2) that the sum of 
] simple events must be 1. 

the proof of Theorem 4.1 is equally 
d the numbers on the flags erected 
dding the numbers on all flags, 


and this sum is 1 by our agree 
the probabilities assigned to al 

Using our pieturesque language, 
easy. For to find P(S) we must ad 
at elements of S. But this means a 
and we know this sum is 1. 

Theorem 4,9. If E and F are events such that E € F, then 
P(E) < P(F); i.e., if E implies F, then the probability of E cannot 
exceed the probability of F. 

Proof. By hypothesis, each element of E is also an element of F. 


Hence, each simple event among those whose union is E is also a 
Simple event among those whose union is F. Since the probability of 
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each simple event is nonnegative, it follows that the sum of the prob- 
abilities of the simple events of F is at least as large as the sum of the 
probabilities of the simple events of E. But this is precisely the re- 
quired conclusion. 

In a Venn diagram, all the points representing elements of E would 
also be in the region corresponding to F. To find P(E) we add the 
numbers on the flags erected at points of E. All of these numbers as 
well as those, if any, that are erected at points in F but outside of F, 
are summed to get P(F). Hence P(E) € P(F), as before. 


Theorem 4.2 says that if event F occurs whenever event Æ occurs, 
then the probability of F is at least as large as the probability of Æ. 


Theorem 4.3. If E is any event, then 0 < P(E) < 1. 


Proof. We have E C S, since an event is by definition a subset of 
the sample space S. Hence by Theorems 4.1 and 4.2, we conclude 
that P(E) < P(S) = 1. Also Ø C E, so that Theorem 4.2 yields 
P(f) € P(E). Since P(f) = 0, our proof is complete. 


The extreme values 0 and 1 are worthy of special attention. We 
know that P(f) = 0 and P(S) = 1. Recalling the definition of im- 
possible and certain events given in the glossary on p. 52, we can 
say that if an event is impossible, then it has probability 0, and if an 
event is certain, then it has probability 1. But the converse of each 
of these implications is false; i.e., if P(E) = 0 we cannot conclude that 
E is impossible, and if P(E) = 1 we cannot conclude that Æ is certain. 
For example, in Solution 2 of Example 3.7, the event that the first 
coin falls tails is (TH, TT), certainly not the empty set. Yet, by 
our assignment of probabilities to the simple events, this event has 
probability 0. In that same example, the event (HH, HT) has prob- 
ability 1, but is not certain since it is not the entire sample space. 

The reason for this state of affairs is that we have allowed simple 
events to be assigned probability 0. If we insisted, as some authors 
do, that the probability of each simple event must be positive, then 
only the empty event would have probability 0, and only the whole 
sample space would have probability 1. However, it turns out to be 
the case in problems involving infinite sample spaces that there must 
exist events that are not impossible but yet have probability 0. Al- 
though we cannot pursue this matter here, our definitions are formu- 
lated in such a way that the reader need not be surprised by this fact 
when he goes on to study probabilities in infinite sample spaces. 


EE TL 
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Theorem 4.4. Let E and F be two events. Then 

(4.2) P(E U F) = P(E) + P(F) - PEN P). 
In words, the probability that at least one of the events E and F 
occurs is obtained by adding the probability that E occurs and the 
probability that F occurs, and then subtracting the probability that 
both Z and F occur. 

Proof. First add the probabilities of all simple events containing 
elements of E. Their sum is P(Z). Then add the probabilities of all 
Simple events containing elements of F. Their sum is P(F). In the 
sum P(E) + P(F) we have included P({o,}) if and only if o; € E U F. 
But we have added P({o;}) twice for every 0; e E (1 F: once in the 
sum P(E), and again in the sum P(F). The sum of the probabilities 
of simple events that are counted twice is P(E (1 F). We conclude 
that P(E) + P(F) — P(E A F) is precisely the sum of the proba- 
bilities of all simple events in Æ U F, each counted once. Since this 
Sum is P(E U F), the theorem is proved. 

The reader should draw a Venn diagram and test his understanding 
of this proof by formulating cach step in the “numbers on flags” lan- 
guage. Before illustrating how Theorem 4.4 is used in a particular 
example, we deduce two more results. 

Theorem 4.5. If E and F are mutually exclusive € 
(4.3) P(E U F) = P(E) + PQ). 
e have only to note that now 


vents, then 


Proof. In the result of Theorem 4.4, Ww 
P(E (YF) = P(f) = 0. 

Theorem 4.5 says that the probability of the occurrence of at least 
One of two mutually exclusive events is the sum of their individual 
Probabilities. Let us not forget the italicized hypothesis that must be 
true before using Formula (4.3). The use of Formula (4.2) requires 
no such caution, since it holds for any two events. 

Theorem 4.6. Let E and D be any complementar, 
(4.4) P(E’) -1- P(E). 

In Words, the probability that E does not occur is obtained by sub- 


tracting from 1 the probability that E does occur. 


Proof. E and E' are mutually exclusive events, 


Hence by (4.3 
PS pE U E) = P(E) + PC. 


y events. Then 


since E N E' = f. 
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But E U E' = S, the entire sample space, and P(S) = 1 by Theorem 
4.1. Hence 
1= P(E) + P(E), 
which is equivalent to (4.4). . 
In our less formal language, (4.4) is merely the result of noting that 
we obtain the sum of the numbers on all flags in S by adding the sum 


of the numbers on flags in E to the sum of the numbers on flags not 
in Æ. 


The following examples illustrate how our formulas can be used to 
compute probabilities. 


Example 4.1. "Three coins are tossed. Find the probability of get- 
ting at least one head. We assign equal probabilities tc the eight 
simple events of the sample space S defined in (2.1), p. 81. IF 2 IR 
the event “at least one head,” then the complementary event Æ’ is 
“no heads.” By Theorem 4.6, 


P(E) = 1 — P(E’) 
1 — P({TTT}) 


= see 
—-1-$4-f. 


ll 


Note that we could have computed P(Z) directly by recognizing 
that E is itself the union of seven simple events. This example is $0 
simple that either method is easy. But often in more complicated 
problems, the most efficient way to find the probability of an event 
is first to compute the probability of its complementary event and 
then use Formula (4.4). Recall that we followed this procedure in 
solving the birthday problem. Although interested in the event Æ 
(at least two people have the same birthday) we found it convenient 
(see Example 2.1) to first study the event J’ 


(no two people have the 
same birthday). 


Example 4.9. An integer is chosen at random from the first 200 
positive integers. What is the probability that the integer chosen is 
divisible by 6 or by 8? 

Let E be the event 4 
event "integer selected 
P(E UF). We define S 


nteger selected is divisible by 6" and F the 
is divisible by 8.” We are required to find 
= {1, 2, 3, ---, 200} and assign probability 
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zig to each simple event of S. Now E contains [28°]* = 33 integers 
and is therefore the union of 33 simple events, each with probability 
zov. Hence P(E) = #35. Similarly, F is the union of [222] = 25 
simple events, so that P(F) = $&. Since there are integers among 
the first 200 (like 24 and 48) that are divisible by both 6 and 8, the 
events E and F are not mutually exclusive. Hence we must compute 
P(E N F). An integer is divisible by both 6 and 8 if and only if it 
is divisible by 24, the least common multiple of 6 and 8. There are 
[322] = 8 integers among the first 200 that are divisible by 24. Hence 
P(E A F) = $y. By applying Formula (4.2), we find the required 
probability, 
P(E UF) = tih — ho =F 

(For a generalization of this result, see Problem 4.11.) 


We have seen that in many examples it is reasonable to assign the 
Same probability to each simple event of the sample space. In this 
circumstance, there is à simple formula for the probability of an 


event. 


Theorem 4.7. Suppose each of the n simple events of the sample 
obability. (This probability 


Space S in (4.1) is assigned the same pr 
must then be 1/n.) If E is an event containing f elements, then 
(4.5) P(E) - i 

n 
an event is the ratio of the number 


In other words, the probability of 
number of elements in the entire 


of elements in the event to the 


Sample space. 

Proof. Since E contains f elements, E is the union of f simple events 
of the sample space S. Hence, directly from the definition, P(E) is 
the sum of f probabilities, each equal to 1 /n. But this sum is pre- 
cisely f/n, so that our proof is complete. 
periment favorable" to E whenever 
be paraphrased as follows: If an ex- 
ely outcomes, then the probability 
teomes favorable to E to the total 
assic definition" of probability 


If we call an outcome of the ex 
E oceurs, then 'Theorem 4.7 can 
Deriment can result in n equally lila 
of E is the ratio of the number of ou 
number of outcomes. This is the cl 


* The symbol [z] stands for the greatest integer less than or equal to the number 
t. Thus, 
(3.6) =3, [sl = o Pal=5, ete 
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given by Laplace (1749-1827), one of the first and most important 
contributors to the mathematical theory of probability. Let us not 
forget that this rule for computing probabilities is applicable only 
when all simple events have been assigned the same probability. 
Thus, Formula (4.5) does not apply to the wide variety of important 
problems where it is not reasonable to make this special assignment 
of probabilities to simple events. 

Ordinarily, to compute P(E) we must first determine which ele- 
ments of the sample space are in E, and then we add the probabilities 
of the corresponding simple events. But when Theorem 4.7 applies, 
we need only know how many elements are in E. It is therefore ex- 
tremely useful to have effective techniques for counting the elements 
in sets specified by defining properties. In Example 3.6, for instance, 
the probability of the event that at least two people have the same 
birthday was easy to find because we had been able (in Example 2.1) 
to count the elements in this event by using the fundamental princi- 
ple of counting. We discuss some other techniques for counting in 
the next chapter. Until then, our examples will be chosen so as to 
lead to events whose elements can be counted by explicit enumeration 
or by use of the fundamental principle. 

We conclude this section with a brief diseussion of the relation be- 
tween the probability of an event and “odds” for the event. 


Definition 4.1. Let E be any event. We say that odds for E are à 
to b if and only if 


d = 


If odds for E are a to b, then odds against E are b to a. 


Table 11 gives some common odds and corresponding probabilities. 


TABLE 11 


Odds for E P(E) 


ltol 
2 tol 
3 tol 
3 to 5 
1to2 
12 to5 


pe sae ios Mo s ue 
" 


Sec. 4 / SOME PROBABILITY THEOREMS TA 

Given the odds for E we have only to apply Definition 4.1 to find 
the probability of E. On the other hand, if P(E) is given, we write 
it in the fractional form a/(a + b) and then know that the odds for 
E are a to b. For example, if P(E) = 0.7, we first write 

7 7 
PŒ) = i0773 

Hence, odds for E are 7 to 3. Since odds are often used to express 
probabilities of events, it is useful to be able to translate odds to 
probabilities and vice versa. 


PROBLEMS 


he probability of the event E that the 


4.1. Two fair dice are rolled. Find t 
dots on the two uppermost faces do not add to 4. What are odds for E? 


42. A card is drawn at random from a standar 
E be the event “card selected is an ace” an 
is a spade.” 


(a) Are E and F mutually exclusive events? 
(b) Find the probability that at least one of the events E and F occurs. 


(c) What are odds for the event BUF? 


d deck of playing cards. Let 
d F the event “card selected 


4.3. A fair die is rolled twice. What are odds for the event that at least 


one roll yields a number less than 3? 


4.4. Odds a to b and c to d are said to be equal if a:b = c:d, i.e., if their 
ratios are equal. For example, odds of 10 to 5, 4 to 2, and 2 to 1 are 


equal. 
(a) Show that if odds for two events are equal, then the events have 
equal probabilities. 
t E are equal to odds for the com- 


(b) Show that odds against an even 
plementary event E. 
4.5. Odds for event E are 2 to 1. Odds for BU T 
this information, what are the smallest and 
the probability of event pm 


4.6. A card is drawn at random f 
card is replaced, and then ano 
full deck. 


(a) Define a suitable sample space for 
abilities to its simple events. 


are 3 to 1. Consistent with 
largest possible values for 


of 52 cards. This 


rom an ordinary deck 
t random from the 


ther card is selected a 


this experiment and assign prob- 
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4.7. 


4.8. 


4.9. 


4.10. 


4.11. 


4.12. 


4.13. 
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(b) Find the probability that at least one of the cards selected is the 
ace of spades. 

(c) What are the odds for the event that neither card is the ace of 
spades? 

Repeat the preceding problem, but now assume that the first card is 

not replaced before the second is drawn. 


The output of a machine producing nails is known to contain 2% 
defectives, the other 98% meeting specifications. From the very large 
lot of nails produced by the machine, two nails are drawn at random 
and inspected. 


(a) Define a suitable sample space for this experiment and make 8 
reasonable assignment of probabilities to its simple events. : 
(b) Find the probability that at least one of the nails is defective. 


A high school senior applies for admission to college A and college B. 
He estimates that the probability of being admitted to A is 0.7, that 
his application will be rejected at B with probability 0.5, and that the 
probability of at least one of his applications being rejected is 0.6. 
What is the probability that he will be admitted to at least one of the 
colleges? 


If in Theorem 4.2 we make the hypothesis that E is a proper subset of F, 
i.e., that E C F but E # F, does it then follow that P(E) < P(r)? 


(a) An integer is chosen at random from the first 20 positive integers. 
What is the probability that the integer chosen is divisible by 6 
or 8? 

(b) An integer is chosen at random from the first 2000 positive integers. 
What is the probability that the integer chosen is divisible by 6 
or 8? 

(c) The result of Example 4.2 in the text together with the results of 
parts (a) and (b) should lead you to conjecture a general theorem 
of which these results are special cases. State such a theorem and 
try to prove it. 


Prove that if E and F are any events, then 

P(E A F) € P(E) < P(E U F) < P(E) + PP). 
Let E and F be any two events. Suppose the numbers P(E), P(F), and 
P(E Y F) are known. Find formulas in terms of these numbers for the 
following probabilities. In each case give a verbal description of the 
event whose probability you are finding. 
(a) P(E' U F’) (b) P(E' A F’) 
(c) P(Z' VJ F) (d) P(E' A F) 
(e) PEO F’) (0 PEA F)’) 
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444. 


4.15. 


4.16. 


4.17. 


4.18. 


4.19. 


4.20. 


4.21, 


Generalize Theorem 4.4 by showing that the probability of the cceur- 
rence of at least one among three events Li, E», and E; is given by 
(4.6) P(E, U EA E) = P(E) + P(E) + P(E3) — P(E O E») 
— P(E, O Es) — P(E: O E39) + P(E O E: O ES). 
[N ote: You will find a Venn diagram like the one in Figure 7 helpful 
in checking that the probability of each simple event making up 
E, U Es U Es is counted once and only once in the expression on the 
right in (4.6).] 
From a standard deck we select one card at random. Use Formula 
(4.6) to find the probability that the card is a spade, an honor card, or 


a deuce. 
Use Formula (4.6) to find the probability that a number selected at 


random from the first 200 positive integers is divisible by 6 or 8 or 10. 


We make a definition 


prove the theorem. 
Definition 4.2 Let k be any integer greater than 1. Events Ej, E», 
in pairs if and only if all 


+++, Ey are said to be mutually exclusive i 
possible pairs of events from Dy, Es, +++, Ex are mutually exclusive, 
ie, By QO E; = f) for all i zz j where t and j can assume the values 


1, 2, «0, Be 

Theorem 4.8 If £i, 
(4.7) P(E, U E: U Bs) = P(E) + P(E:) + P(Es). 
t Ey N E:N Es = Ø. Show by example 
hold. (Cf. Problem 1.4.6b.) 
of Theorem 4.8 by mathematical 


and then state a theorem. Use Formula (4.6) to 


Es, and Es are mutually exclusive in pairs, then 


Suppose we assume only tha 
that (4.7) does not necessarily 


Prove the following generalization 
induction. 


Theorem 4.9 Let & be an 
events Ey, Es, +++, Er are m 


y integer greater than 1 and suppose the 
utually exclusive in pairs. Then 


(4.8) PQAMIEM:- U E) = P(E) + P(E) + +++ + P(E. 


thesis in Theorem 4.9 so that Zi, Es, ***; E, are any 
y mutually exclusive in pairs. With this weaker 


following weaker result: 
) + PG) + *7* + P(E;). 
h when using a deck of 


Modify the hypo 
events, not necessaril 
hypothesis, prove the 

P(,\UE,U +: UB) € P(E: 
(a) Find the probability of at least one mate! 


three cards. (Cf. Problem 3.9.) . 
tch using four numbered 


(b) Find the probability of at least one ma 
squares and four cards. (First define a sample space and make an 


acceptable assignment of probabilities to its simple events.) 
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(c) We want to find a formula for the probability of at least one 
match when using N squares and a deck of N cards (numbered 
1, 2, +--+, N), where N is any positive integer. Define a suitable 
sample space for this experiment, determine the number of ele- 
ments in this sample space, and then make an acceptable assign- 
ment of probabilities to its simple events. Note that we could 
find the probability of at least one match if we had a formula for 
P(E, U E: +++ U Ey), where E, denotes the event that a match 
occurs at card number j. Can you guess this formula by detetom 
a pattern in Formulas (4.2) and (4.6)? If not, then first use (4.2) 
and (4.6) to derive a formula for the special case N = 4, and pe 
try guessing again. The proof of the correct general formula and 
its use to find the probability of at least one match require counting 
techniques that we have not yet discussed.* But even when we 
can’t complete a problem, it is useful to think about it and try 
to see what we need to learn in order to be able to complete it. 
This problem is the famous problem of rencontre in probability 
theory and was originally discussed by the French mathematician 
Montmort (1678-1719). 


5. Conditional probability and compound experiments 


Suppose an experiment is performed and we are interested in the 
probability of some event E. But now assume that we are given ad- 
ditional information, namely, that another event 7? has occurred. In 
this section, we discuss how the computation of the probability of E 
is affected by the information that F is known to have occurred. 

It is helpful first to take a close look at an example in which we can 
find reasonable answers on intuitive grounds. The methods we em- 
ploy in this simple example will lead us to formulate precise defini- 
tions that will become part of our mathematical theory. 


Example 5.1. A club with five male and five female charter mem- 
bers elects two women and three men to membership. From the total 
of 15 members, one person is selected at random. We are intereste 
in two events: 

E = person selected is a male, 
F = person selected is a charter member. 


*See W. Feller, An Introduction to Probability Theory and Its Applications» 
2nd edition, John Wiley and Sons, Ine., 1957, pp. 88-91. For another solution an 
interesting historical comments, see I. Todhunter, A History of the Mathematical 
Theory of Probability, Chelsea Publishing Co., 1949, pp. 91-93. 
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As sample space we take a set S of 15 elements, one for each club 
member. Since the selection is “at random" we assign probability 
3's to each simple event of S. Observing that E is the union of eight 
simple events, the union of ten simple events, and E (^) F the union 
of five simple events, we calculate 


PH) -& PP) =B=b PENDS 55t 


So far we have nothing new. But now suppose we are informed 
that the person selected is à charter member. What is the probability 
of E, now that this fact about the outcome of the experiment has 
been made known to us? Most people quickly answer that the re- 
vised probability of Z should be 3%. They reason as follows: Since 
F is known to have occurred, we know that one of the ten charter mem- 
bers was selected. The event E occurs if one of the five male charter 

he selection is at random, the proba- 


members is selected. Because t 
bility of selecting one of the five males from the ten charter members 


is 4. If we introduce the symbol P(E|F) to denote this revised or 
conditional probability of E given F, then 

P(E) = ts and P(E|P) = 5 
Thus, in this example the probability of E decreases due to the added 
information that event F has occurred. 


This informal and intuitive reasoning can be described in another 


way. Ordinarily, given a sample space S and an acceptable assign- 
ment of probabilit ies to the simple events of S, we compute the prob- 
ability of an event E by adding the probabilities of the simple events 
whose union is E. Since P(S) =1 and E(18 = E, we can write 
the identity 

P(E OS) 
(5.1) PE) = -PG , 


s the ratio of the probability of that part of 
E) to the probability of 


which shows that P(E) i 
E included in S (which happens to be all of 


S itself (whi ens to be 1). 

But A "eld that event F has occurred, then the outcomes 
corresponding to elements of F’, the complement of F, are no longer 
possible. Hence, in the light of our added information about the out- 
come of the experiment, the event F replaces the sample space S as 


the set whose elements correspond to all possible outcomes of the 
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experiment. With this in mind, observe how reasonable it appears to 

write, in analogy with (5.1), 
P(ECOF 

(5.2) P(E|F) = n. 

which says that P(E|F), the conditional probability of E given F 1 is 

the ratio of the probability of that part of E included in F (which is 

E N F) to the probability of F itself. 


Applied to the problem in Example 5.1, this ratio is 


P(E|F) = 


, 


colta Jeol 
tm 


as before. 


Formula (5.2) is the basis of our formal definition of conditional 
probability. 


Definition 5.1. Let E and F be two events of a sample space 5S. 
Suppose an acceptable assignment of probabilities has been made to 
the simple events of S in such a way that P(F) > 0. Then the con- 
ditional probability of E given F, denoted by P(E|F), is defined by 


Equation (5.2). The conditional probability of Z given F is undefined 
if P(F) = 0. 


Formulas (5.1) and (5.2) show that the role of F in computing 
P(E|F) is analogous to the role of S in computing P(E). It is helpful 
to carry this analogy further. When we are told that F has occurred, 
then F can be considered as a new sample space, since all possible 
outcomes of the experiment must now correspond to elements of F. 
Then we must be sure to have the probabilities of the simple events 
of F add to 1, as they must for any sample space. But they actually 
add to P(F). If P(F) = 1, then no changes are required. However, 
if P(F) <1, we imagine the probabilities of all simple events of F 
increased proportionately by dividing each by the same number P(/). 
We thus obtain new probabilities for the simple events of F. In view 
of the relation between original and new probabilities of simple events 
of F, Formula (5.2) can be paraphrased as follows: The conditional 
probability of E given F is the sum of the new probabilities of those 
Simple events whose union is the event E (YF, i.e., whose union 1$ 
the part of E included in the new sample space F. 

Thus P(E|F) is simply a probability calculated for events con- 
sidered as subsets of the new sample space F. It follows that the 
formulas we proved in Section 4 for probabilities relative to the 
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sample space S apply without modification to conditional probabili- 
ties relative to the information that à fixed event F has occurred. 
(See Problem 5.16.) 


Example 5.2. Three fair coins are tossed, one after the other. Let 
E be the event “at least two heads" and F the event "first coin falls 
heads." We define the usual sample space containing the eight out- 
comes HHH, HHT, + TTT and assign each simple event the 
probability 4. Then E is the union of four simple events, F the union 
of four simple events, and E (^ F the union of three simple events. 
Hence P(E) = P(F) = $ and P(E N F) = $. Thus, the conditional 
probability of E given F is s 

amn -PEOD -— 
PEW) =- pP) ~$ 


As expected, the added knowledge that the first coin falls heads in- 
creases the probability of getting at least two heads. Before this ad- 
ditional information is revealed, P(E) = 3. Afterwards, P(E|F) = $. 


selected at random from among 321 
re reported in Table 4 on p. 24. Let 
and F “man is in union less than one 


Example 5.3. A person is 
union men whose opinions We 
E denote “man answers yes” 
year." Then we compute (as in Problem 3.6), 

P(E) = 33 P(F) = s» P(E NF) = i$ 
Therefore, 


21. 
P(E) = n= i 
Bat 


ledge that the man js in the 


Note that P(E|F) < P(E); ie. the know 1 
robability that he answers 


union less than one year decreases the p 
et. n 


yes 
Example 5.4. À card is selected at random from a standard deck. 
Let E denote “card is a spade” and F “card is an ace. Then 
P(E) = 5 P(F) = i» P(E (P) = 3» 
and 
d 
B\F) = =7 
Pan = * 
nd P(E|F) are equal: the knowl- 


Here we have a case in which P(E) a fe : 
edge that the card is an ace does not change the probability that it 


Is a spade. 
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The important but special case when P(E) and P(E|F) are equal, 
as in Example 5.4, will be discussed in Section 7 where we introduce 
the concept of independent events. In the remainder of this section, 
we consider some consequences of Definition 5.1, as well as an appli- 
cation of conditional probabilities to so-called compound or composite 
experiments. 

It P(E) » 0, the roles of E and F in (5.2) can be interchanged. 
Then the conditional probability of F given E is 


PEDE PENF) 
(5.3) P(F|E) = PU) ^ PŒ) , 
the last equality following from the commutative law for the inter- 
section of two sets. : 

By solving (5.2) and (5.3) for P(E (^) F), we obtain the following 
result, sometimes referred to as the theorem on compound probabilities: 
(5.4) P(E (YF) = P(E)P(F|E) = P(F)P(E|F). 

Formula (5.4) finds extensive use when we compute probabilities 
for events defined in terms of a compound experiment. For example, 
the experiment in which we toss a coin, toss it again and then toss it 
a third time is an example of a compound experiment with three 
trials. If we have two urns containing colored balls and we choose an 
urn and then a ball from that urn, we have performed a compound 
experiment with two trials. Many experiments are most conveniently 
described as a compounding of two or more trials: first, something 
is done (trial number 1); then, after the first trial is completed, some- 
thing else is done (trial number 
2); etc. An example will best serve 3 Gen X 
to illustrate the use of conditional Q UmrI L 
probabilities in such compound 2 Z——Red ie 
experiments. g Green i 


Example 5.5. Urn I contains Umm Erd $ 
three green and five red balls. Urn 2 " 
II contains two green, one red, and * Yellow $ 


two yellow balls. We select an urn Figure 12 
at random and then draw one ball 
at random from that urn. What is the probability that we obtain 2 
green ball? 

The data of the problem are conveniently summarized in the tree 
diagram of Figure 12. Since the urn is selected at random, we write 
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probability 4 on each branch leading from the starting point to an 
outcome of the first trial (number of the urn). We are also given 
conditional probabilities of drawing a ball of a specified color, given 
the urn selected. These conditional probabilities appear on each 
branch leading from an urn to an outcome of the second trial (color 
of ball). We give two solutions to the problem posed in this example. 
Solution 1. The event “green ball selected” can occur in one of 
these two mutually exclusive ways: (1) select urn I and draw a green 
ball, or (2) select urn II and draw a green ball. Hence the event 
“green ball selected” is the union of the mutually exclusive events 
described in (1) and (2). By Formula (4.3), we obtain (with obvious 
shorthand notation for events); 
P(green) = P(urn I and 
Each of the terms on the right is the probability o: 
two events. Applying Formula (5.4); 
P(green) = P(urn J)P(green|urn DF 
rized in Figure 12, 


green) + P(urn II and green). 
f an intersection of 


P(urn I1)P(green|urn II), 


and using the data summa: 
BN P(green) = (0) + OO = $9 
Solution 2. We go back to first principles. Let us define as sample 


space for this compound experiment the set 


= (Ig, Ir, Ig, Ilr, Hy} 
whose elements are ordered pairs denoting the outcomes of the two 
trials making up the experiment. Thus, Ig denotes the outcome for 
which urn I is selected and then the green ball drawn, ete. 
Each of the five simple events of S corresponds to one path from 
left to right through the tree in Figure 12, We assign to each simple 
event the probability given by the product of the numbers appearing 


TABLE 12 


Probability 


Simple Event of S 
SS 
{Ig} (X = ss 
{Ir} (4) (8) = te 
(Ig) QG =t 


{Ir} 
Uy} Qe = 


oe 
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in the tree along the path to which that simple event corresponds. 
Using this rule, we obtain the probabilities listed in Table 12 and 
appearing at the end of each path through the tree in Figure 12. 
Let us note that this is an acceptable assignment of probabilities to 
the simple events of S: each probability is nonnegative and their 
sum is 1. 

Now the event “green ball selected” is the union of the two simple 
events (Ig) and (IIg). Hence 


(5.6) P(green) = d; +4 = 82 
as in Solution 1. 


The reader may object that in Solution 1 we have violated our rule 
requiring the designation of a sample space and an assignment of 
probabilities to its simple events before probabilities can be computed. 
Strictly speaking, this claim is correct. But by comparing (5.5) and 
(5.6) we observe that the sample Space and assignment of probabili- 
ties in Solution 2 were implicit in Solution 1. Indeed, let us agree 
that à compound experiment of x trials will always have as sample 
space the set S of ordered n-tuples denoting possible outcomes of the 
experiment. If we are given enough data (in the form of certain 
probabilities and conditional probabilities) to fill in a tree diagram 
like the one in Figure 12, then an acceptable assignment of proba- 
bilities to simple events of S is made as in Solution 2: Each simple 
event corresponds to one path through the tree, and the product of 
the numbers appearing on the branches of a path is the probability 
assigned to its corresponding simple event. It can be shown (see 
Problem 5.17) that the resulting assignment of probabilities to the 
simple events of S is not only acceptable, but is the only assignment 
consistent with the data of the problem, i 

Solution 1 is shorter, more direct, and easier than Solution 2. It i$ 
typical of many problems involving compound experiments that We 
choose to compute unknown probabilities by using the data of the 
problem directly, and thus bypass the explicit construction of ? 
Sample space and assignment of probabilities to its simple events 
We shall adopt the shorter direct solution from now on, but with the 
knowledge that we could, if called upon to do so, go back to first 
principles and complete the longer, less direct solution that underlies 
the shorter procedure. 


The theorem on compound probabilities, as expressed in (5.4), i$ 2 
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Special case of an extremely useful formula which we now prove. 
Theorem 5.1. If n is any integer (n > 2) and Ey, E», +++, En are 
any n events for which P(E; N Es N -++ Q Ez) = 0, then 
(5.7) P(E Q E:N + ES) = PU) P(E Ey) PBs] By CY Es) 
wet P(E,|Ei (Y Es (Y Lnd CY). 
Proof. Denote by S, the statement expressed by Equation (5.7) 
and let N denote the set of those integers n for which S, is true. We 
use the method of mathematical induction to prove that N is the set 
of all integers greater than 1. 
(i) 2eN. For S» is the statement 
P(E, N E:) = P(E) PEE) 
That S; is true follows from the definition of P(E£.\E,). Note that our 
hypothesis reduces to P(Ei) # 0 when x = 2, so that the conditional 
probability is defined. 
(ii) Now assume k e N, where k is any integer greater than 1. We 
Want to prove that also (k + 1) e N. But S; is the statement 
(68) PUB A) EO: 0E) 
= P(E) P (221) dia P(E; \Ey A E: (1 xd P Ey). 
We verify that by the definition of conditional probability (and using 
Properties of set intersection), 
PCB, C) Es C: O Be) 1L pg, MA AQ) ERO 7 OE: 
PURA IA eadeni 
Multiplying corresponding sides of Equa 


(5.9) 
tions (5.8) and (5.9) yields 
PCE, C E YO Een) 

= P(I)POSE) ++ PRAE O Es CY: A E), 
Which is precisely the statement Si. Hence we havi 
if k e N, then (k + 1) eN for every PP 


We conclude from (i) and (ii) that N 
greater than or equal to 2, and thus Theorem 5.1 is pro 


e shown that 


is the set of all integers 
ved. 


d that an urn contains v red and 5 — x 
h can be 0, 1, 2, 3, 4, or 5) is not 
a ball at random from the five 
e eolor of the ball he draws. 
ily and lose otherwise. 


Example 5.6. You are tol 
&reen balls, but the value of x (whick 
disclosed to you. Mr. Y is to draw 
balls in the urn, and you must guess th 
We shall say that you win if you guess corree 
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Let us consider each of the following strategies for making your 
guess. 


Strategy 1. Guess that Y will draw a red ball. 

Strategy 2. Guess that Y will draw a green ball. 

Strategy 8. First draw a ball from the urn. If it is red, then guess 
Y will draw a red ball. If it is green, then guess Y will draw a green 
ball. 

Strategy 4. Draw a ball from the urn and replace it. Then draw 
another ball and replace it. If both balls are red, then guess Y will 
draw a red ball. If both balls are green, then guess Y will draw a 
green ball. If you draw one red and one green ball, then draw one 
more ball from the urn. If this ball is red, then guess Y will draw a 
red ball. If it is green, then guess Y will draw a green ball. 

Strategy 5. Same as Strategy 4, except that the first ball is not 
replaced before the second is drawn. Also, if a third draw is required, 
it is done without replacing the first two balls. 


We are interested in calculating the probability that you win, i.e. 
your guess is correct. This probability will depend on the unknown 
value of z (which determines the composition of the urn) and on the 
strategy you decide to use. For example, if you choose Strategy 1, 
then you guess red. Y draws a red ball from the urn with probability 
z/5. Putting z = 0, 1, 2, 3, 4, 5 in turn, we get the probabilities of 
Winning listed in Table 13 in the column headed Strategy 1. In that 


TABLE 13 


Number of Red 
Balls in Urn 
z 


Probability of Winning with Strategy 


2 3 4 5 


table, we list the probability of winning for each possible composition 
of the urn and for each of the five available strategies. To see how 
these probabilities are calculated, consider a few examples. 
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(a) Suppose x = 2 and you adopt Strategy 3. Then you win when- 
ever you and Y both draw red balls or both draw green balls. In 
order to simplify the notation, let us write Rı to denote the event 
that the first ball you draw is red, Ry the event that Y draws a red 
ball, Gs the event that the second ball you draw is green, etc. Since 
the events R; N Ry and Gi N Gy are mutually exclusive, 

P(win) = P(Ri N Ry) + P(G N Gy) 
= P(R)PQU|R) + P(G)P(6y|6), 
the last equality following from Formula (5.4). But since Y draws 
from the full urn which we are assuming contains two red and three 
green balls, 
P(RyR) = P(Ry) 2.4, — P(Gy|G) = P(Gy) = 6. 
Also P(R)-4, PG@) = 6, 
So we compute 
P(win) = (4)(4) + (.6)(.6) = .52. 


ars in Table 13 in the row labeled « = 2 and 


This probability appe 
13, we have drawn the tree 


column headed Strategy 3. In Figure 


R (win) 16 


4 
now 
6 7G (ose) .24 


R (lose) .24 


4 
G (guess G) 
6 7G win) 36 


First Your Y's Outcome Probability 
ball guess ball 
| Figure 13 


Strategy 3. We have writ» 
that applies when z — 2. 
diagram the computation 


diagram for this experiment when you use 
ten next to each branch the probability 
The reader would do well to follow on the 
We have just completed. 


(b) Suppose « = 2 and you adopt Strategy 5. The tree diagram 
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for Strategy 5 is much more complicated, and is drawn in Figure 14. 
Here too, we have written next to each pranch the appropriate prob- 
abilities for the case x = 2. We note that there are now six mutually 
exclusive ways of winning. For example, you win if the experiment 


4 R (win 04 
R (guess R) — 
6 (lose) .06 
25 4 (win) 04 
33 R (guess R) s 
R 4 
75 
67 ET 


G (guess G) « 
gi ) = 


(lose) .06 
(lose) .08 


D AAI VBA 


m (win) d 
D (win) E 
R (guess R) 
5 3:33 s = (lose) .06 
R 4 (lose) .08 
67 G (guess G) 
6 ~G (win) .12 
5 
4 R (os) 12 


G (guess G) SS 
6 G (win  .18 


MEN EN. bog i 


First Second Your Third Your Y's Outcome Probability 
ball ball guess ball guess ball 
Figure 14 


results in the event Ri N G2 N R: N Ry in which you first draw & 
red, then a green, then another red (and thus guess red) and then Y 
draws a red ball. This event corresponds to the third path through 
the tree in Figure 14. We find the probability of this event by apply- 
ing Formula (5.7) with n — 4. 

(5.10) P(fà (Y Gs C) Rs N Ry) 


= P(R)P(G:|R)P(R:|Ri N G2) P(Ry|Ri N G2 N G) 
But again 


P(Ry|Ri N Go N Gs) = P(Ry) = 4 = P(R). 


Given that you drew a red ball first, since you do not replace this ball, 
there remain four balls in the urn, of which three are green. Hence 


P(G4R) = .75. 
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If you get a red and a green, then the third ball is selected from an 
urn containing one red and two green balls. Hence 


P(RjR; (G9) = $ = 33, approximately. 


Putting these values in (5.10) we find 
P(Ry (Y Gs Y R N Ry) = (.4)(.75)(.88)(.4) = .04. 


This probability appears at the end of the third path through the 
tree in Figure 14 and is just the product of the probabilities for the 
branches of that path. Adding the probabilities of all events (paths) 
for which you win, we find that when x = 2 and you use Strategy 5 
ng is .54. This number appears in the ap- 


your probability of winni 
Table 13. In this way all the entries in 


propriate row and column in 


Table 13 are computed. 
All other things being equal, you prefer the strategy which gives 


you the highest probability of winning. Thus, referring to Table 13, 
since the probabilities in Column 4 are at least as large as those in 
Column 3 for all possible compositions of the urn, you prefer Strategy 
4 to Strategy 3. For the same reason, Strategy 5 is preferred to 
Strategy 4. However, if for some reason, you are Sure that the urn 
contains more red than green balls (ie., v = 3, 4, or 5), then you 
might reasonably prefer Strategy 1 over any of the other strategies. 
A complete analysis is not possible here, but it is clear that the strat- 
egy you prefer will depend upon a number of factors that we have 
omitted from our discussion. For example, there may be a prize for 
winning and a penalty for losing. You may have to pay for the in- 
formation you get by drawing one or more balls from the urn before 
making your guess. Thus, Strategy 3 may cost you more than Strat- 
egy 1 or 2, and Strategies 4 and 5 may cost still more than Strategy 3. 
The strategy you prefer will depend upon all of these factors, as well 
as on your belief about the composition of the urn. References 2 and 
10 in the supplementary reading list at the end of this chapter may 
be consulted for discussions of how to evaluate strategies and why 
this is of great importance in statistical decision theory. 


PROBLEMS 
5.1. A green and red die are rolled. 


(a) Find the conditional probability of obtaii 
10, given that the red die resulted in a 5. 


ning a sum greater than 


86 


5.2. 


5.3. 


5.4. 


5.5 


5.6. 


5.7. 
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(b) Find the conditional probability of obtaining a sum less than 6, 
given that the red die resulted in a 2. 

(c) Find the conditional probability of obtaining sum 7, given that the 
red die resulted in a number less than 4. 

(d) In parts (a)-(c), how does the given information affect the prob- 
ability of the event in question? 


A fair coin is tossed three successive times. Find the odds for obtaining 
three heads. How do the odds change if it is given that the second toss 
resulted in a head? 


Three indistinguishable objects are distributed in three cells. Find the 
conditional probability that all three occupy the same cell, given that 
at least two of them are in the same cell. 


A committee of three is selected from six people A, B, C, D, E, and F. 
[Cf. Problem 3.3.] Find the conditional probability of A and B being 
selected, given that neither C nor D were selected. 


Two people are selected (one after the other) at random from the 321 
union men whose opinions are recorded in Table 4 on page 24. Find the 
probability that both men answered “yes.” 


Students in a summer school program took two courses: Chemistry and 
History. The registrar reports that 4 percent failed Chemistry, 3 per- 
cent failed History, and 1 percent failed both Chemistry and History- 


(a) What percentage passed Chemistry and failed History? . 

(b) Among those who failed Chemistry, what percentage also failed 
History? 

(c) Among those who failed History, what percentage also failed 
Chemistry? 


(a) A fair coin is tossed three successive times. 


(i) Find the probability that the third toss results in heads. . 
(ii) Find the conditional probability that the third toss results 1» 
heads, given that the first two tosses result in heads. 


(b) A fair coin is tossed N successive times, where N is a positive 
integer. 


(i) Define a suitable sample space and make the appropriate 857 
signment of probabilities to its simple events. 
(ii) Find the probability that the Nth toss results in heads. , 
(iii) Find the conditional probability that the Nth toss results 1D 


S ads, given that all preceding tosses result in heads. (Ques 
tion: Does the coin have a memory?) 


5.9. 


5.10. 
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5.8. 


The manager of a retail grocery store advertises the following promo- 
tional scheme in the newspaper. During a specified week, each customer 
purchasing at least $10 worth of groceries at one time will receive a 
numbered ticket. At the same time, the cashier places a duplicate ticket 
in a large bowl. Tickets are numbered serially, starting with number 1. 
At the end of a week, the tickets in the bowl are thoroughly stirred 
one ticket is chosen, and its number determines the winner of a pre 


viously announced cash prize. Let us suppose that 200 tickets are 


distributed during the week. 

(a) What is the probability that the first digit of the winning number 
is 1? 

(b) What is the condi 
winning number is 
than 100? 

(c) What is the probability that 
is 9? 

(d) What is the probability that the first digit of the winning number 
is 9, if it is known that the winning number is greater than 100? 

(e) Suppose the number of tickets distributed is a positive integer, 
say N. We want each of the nine digits (1, 2, 3, +++, 9) to have the 
same probability of being the first digit of the winning number. 
What are all the possible values of N? 

ed up with two good ones. You start 

you have discovered both defectives. 


tional probability that the first digit of the 
1, given that the winning number is greater 


the first digit of the winning number 


Two defective radio tubes get mix 
testing the tubes, one by one, until 


(a) Construct a tree diagram for this experiment. 

(b) What is the probability that the second defective tube will be the 
second tested? the third tested? the fourth tested? What is the 
sum of the three probabilities you computed? Is this a sum that 


could have been expected before doing the computations? 


nd r red balls. One ball is drawn at random. 
balls of the same color are added to the urn, 
where c is some positive integer. Another ball is drawn at random from 
the urn and this ball, together with c more of the same color are again 
added to the urn. This procedure can be repeated any number of times 
and supplies a model (first studied by G. Polya) in which the drawing 
of a ball of either color increases the probability of the same color 
in the next drawing. (Polya drew the analogy with contagious diseases, 
where each case of a disease increases the probability of further cases.) 


robability that the second ball is red, given 


An urn contains g green à 
It is replaced and ¢ more 


(a) Find the conditional p: 
that the first ball is red. 
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5.12. 


5.13. 
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(b) Find the probability that the first three drawings all result in red 
balls. 
(c) Find the conditional probability that the first ball is red, given 
that the second ball is red. 


A drawer contains four black, six brown, and two blue socks. Two socks 
are taken at random from the drawer, one after the other. What is the 
probability that both socks will be of the same color? 


Use Formula (5.7) to find the probability that r people selected at 
random will all have different birthdays. Then find the probability 


that at least two people among the r will have the same birthday, and 
compare with the answer in (3.6). 


Refer to Table 14, which is a fragment from the American Men Mor- 
tality Table published in 1918 by the Actuarial Society of America. 


TABLE 14 


Rate of Mortality Per 1000 


Age at Issue of Policy 


Duration of Policy in Years 


5.14. 


20 
21 
22 


3.96 4.13 
4.01 4.18 
4.06 4.21 


[For example, the entry 3.96 at age 20, duration of policy 3 years, 
means that 0.00396 is the probability that a person now aged 23 who 
was issued insurance at age 20 will die before attaining age 24. Simi- 
larly, the entry 3.91 at age 22, duration 2 years, means that 0.00391 
is the probability that a person now aged 24 who was issued insurance 


at age 22 will die before attaining age 25.] Calculate the probability 
that a man now aged 21 who was i 
(a) between ages 21 and 22 
ages 23 and 24, 


ssued insurance a year ago will die 
» (b) between ages 22 and 23, (c) between 


(a) Let n be a positive integer and define nPz as the probability that 2 
person aged z years will survive n years. Put p, = ip, and show that 
nPz = Diprua ttt Print. 


(b) Let lo be an arbitrary positive integer (an observed number of 
newborn babies) and define for all z > 0, 


l = olp), d: = l — I, 


4i. 
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6.15 


5.18. 


. Prove the following laws, 


. In Table 12, let the probabilitie 


Give interpretations for lz and d; and show that 


l 
apo = Z 
z 


(c) Let q; = 1 — pz Show that 
- d 
qz L 
The probability q= is known in actuarial mathematics as the rate 
of mortality at age T. 
Let E and F be events and suppose P(F) is neither 0 nor 1. Show that 
it P(E|F) > P(E), then P(E|F^ < P(E). 
State why this result is intuitively reasonable. 
in each case assuming the conditional proL- 
abilities are defined. 
(a) P(F|P) = 1. 
(b) PØJF) = 0. 
(c) If E, C Bs, then PGA|P) < PAP). 
(d) P(E'|F) 21— P(E|F). 
(e) P(E U E4F) = PUAF) + POP) — P(E, O EF). 
(E) — PE OF) 


(f) P(ER’) = a 


(g) If P(F) = 1, then P(E\F) = PG). 
(h) If P(F) > 0 and E and F are mutually exclusive events, then 
PP) = 0. 


s of the five simple events be a, b, c, d, 
and e. We know that the sum of these numbers must be 1. Also, these 
numbers must be consistent with the probabilities associated with the 
branches of the tree in Figure 12 since these probabilities are given in 
the statement of Example 5.5. Show that a, b, c, d, e must have the 


values given in Table 12. 
Refer to Example 5.6 of the text. 


(a) By drawing tree diagrams or otherwise, 


given in Table 13. 
(b) The tree diagram for Strategy 5 is drawn in Figure 14. Note that 
the tree diagram for Strategy 4 is the same diagram. But the 
probabilities associated with the pranches of the tree are not the 
same. For the case + = 2, what are the branch probabilities for 


Strategy 4? 


verify the probabilities 
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(c) Suppose you adopt the following strategy: You draw two balls from 
the urn, not replacing the first before drawing the second. If you 
get two red, then you guess Y will draw a red ball. Otherwise you 
guess Y will draw a green ball. Draw a tree diagram for this 
strategy and calculate the probability that you win for each pos- 
sible composition of the urn. 


A seller of rebuilt television tubes and a buyer get together to draw up 
a contract. The seller will supply tubes in lots of 100 tubes. The 
buyer, when a lot is offered to him, wants to protect himself against 
the possibility that the lot contains too many defective tubes. The 
contract therefore provides that out of each lot, two tubes will be 
selected at random and tested. The buyer considers the following alter- 
native plans as guides for making his decision in the light of the 
experimental evidence. 


Plan 1 If both of the tubes tested are satisfactory, then accept the 
whole lot. Otherwise, reject the lot. 

Plan 2 If both of the tubes tested are defective, then reject the whole 
lot. Otherwise, accept the lot. 

Plan 8 If both of the tubes tested are satisfactory, then accept the lot. 
If both are defective, then reject the lot. If one of the tubes 1$ 
satisfactory and the other defective, then select a third tube 
at random from the remaining 98 tubes in the lot and accept 


or reject the lot according as this tube is satisfactory or de- 
fective. 


Denote by E the event that the buyer accepts the lot and let be the 
(unknown) number of defective tubes in a lot offered to the buyer. 
Clearly, P(E) depends upon both the plan adopted by the buyer and 
the quality of the lot as determined by the value of z. 


(a) Obtain a formula for P(E) in terms of x for each of the three plans. 

(b) Substitute the values z — 0, 5, 10, 20, 30, 40, 50 in the formulas 
Obtained in (a) and make three graphs plotting (for each plan) 
the value of z along the horizontal axis and the value of P(E) along 
the vertical axis. A graph of this kind is called an operating charac- 
teristic curve (OC-curve) for a plan. ; 

(c) Which rule is most favorable to the buyer? to the seller? (Note: 
these questions become more interesting and more difficult whe? 
such things as the utilities resulting from the desirable actions © 
accepting good lots and rejecting poor lots, the disutilities from the 
undesirable actions of accepting poor lots and rejecting good lots, 
and the costs of testing tubes are brought into the analysis.) 
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6. Bayes' formula 


In this section, as an application of conditional probabilities, we 
derive a famous formula first used by Thomas Bayes in a paper pub- 
lished posthumously in 1763. To prepare the way, we make a defi- 
nition and prove a preliminary result. 

Definition 6.1. A partition of a set E is a set (Ei, E», ---, En} with 
the following properties: 

(i) BCE (212,50 

(ii) E; QE. =9 G = 1,2, -,nk-12 +++, 039 Fh) 

Gii) Hy) U BU = U Bn E 
E is a set of subsets of E [property (i)] 
(ii)] and exhaustive [property (iii)]. 
ber of one and only one of the subsets 


In words, a partition of a set 
that are disjoint [property 
Every element of E is a mem 
in the partition. 


We are already acqua 
mentary events F and 7" 
space S. For F and F’ are cer 
P (YF = and FUF = S, as required by Definition 6.1. 

From a Venn diagram, we see immediately that (E N F EAE 
is a partition of the set E, (E N F', E NF, E' (Y F} is a partition of 
the set E U F, and (E A F, E f F, E' (A F, E'A F’) isa partition 
of the entire sample space 5S. | 

Two more examples of partitions should suffice to make the notion 
clear. In the sample space 5 of 59 elements, each denoting one 
outcome of the experiment in which a card is selected from a stand- 
ard deck, let Es En Ea, and E, denote the events that the card 
selected is a spade, heart, diamond, and club respectively. Then 
{E,, En, Ea, Ej isa partition of S, since the four subsets are clearly 
mutually exclusive in pairs and exhaustive. Another partition of S 
is the set of all 52 simple events of the sample space S. 

Theorem 63, Let (En E»: E,) be a partition of the sample 


Space S, and suppose each of the events Ex, Es, ***; E, has nonzero 


probability. Let Æ be any event. Then 
P(E) = P(B,)P(E\Ey) + P(E») P(E|E) + 7 P(E,)P(E|En) 


or, using the summation symbol, 
(6.1) P(E) = Z PQL)PQI|EJ. 
= 


inted with partitions of sets. Two comple- 
form the partition {F, F') of the sample 
tainly subsets of S, and we have 
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Proof. From the hypothesis that (E, E», ---, En} is a partition of 
S, it can be shown (see Problem 6.13) that (E N Ey, E N Es, ** 
Ji N E,} is a partition of the event E. Hence 


E = (E N E) U (E N E) U -+ U (E N En) 


expresses E as the union of n mutually exclusive events. Applying 
Formula (4.8) in Problem 4.19 yields the equation 


(602) P(E) = P(E A E) + PE Y Es) + --- + P(E (E). 


But, directly from the definition of conditional probability, we have 
forj = 1,2, ---,n, 


P(E N Ej) = P(Ej)P(E|E;). 


Making this substitution in (6.2) proves the theorem. Note that ES 
have guaranteed the existence of the conditional probabilities in (6.1) 


by our assumption that the events Ey, E», +++, En do not have zero 
probability. 


As the following examples Show, Formula (6.1) is useful because 
an evaluation of the probabilities P(E;) and conditional probabilities 
P(F|E;) is often easier than a direct calculation of P(Z). 


Example 6.1. Freshmen account for 30%, sophomores 25%, jun- 
iors 25%, and seniors 20% of the members of a college fraternity- 
Fifty percent of the freshmen, 30% of the sophomores, 10% of the 
juniors, and 2% of the seniors are enrolled in a mathematics course. 
A member of the fraternity being chosen at random, what is the prob- 
ability that he is enrolled in a mathematics course? " 

We let E denote “member selected is enrolled in a mathematics 
course,” and Ei, E», Es, and E, denote the events that the member 
selected is a freshman, Sophomore, junior, and senior respectively 3 
Then (Ei, Es, Es, F5] is a partition of the sample space S consisting 
of all ordered pairs the first object of which identifies the class anc 
the second object the presence or absence of the student in a mathe- 
matics course. In fact, the data of the problem specify all the proba- 
bilities needed to apply Formula (6.1) with n = 4. We find 


P(E) = (.30)(.50) + (.25)(30) + (.25)(.10) + (.20)(.02) 
= .254, 


or slightly more than 25 


percent of the fraternity are enrolled in ® 
mathematics course. 
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Example 6.2. Find the probability that in a well-shuffled deck of 
cards, the ace of spades is next to the king of spades. Here, as in the 
preceding example, it seems sensible to break the öble up into 
cases, i.e., to consider first the event in which the ace of spades is 
the top card of the deck, then the event that it is the bottom card of 
the deck, and finally the remaining event in which the ace of spades 
is somewhere within the deck. Let E,, E», and E; denote these events 
in the order stated. We choose as sample space S the set of ordered 
52-tuples denoting all possible orderings of the 52-card deck. Then 
(E, E» Es) is a partition of S. If E denotes the event that the ace 
and king of spades are neighboring cards, then noting that only one 
card is next to the top or bottom cards but two cards are next to a 
card within the deck, we find 

P(E) = P(E) = 3» P(Ej) = $$ = 36 
P(E|E) = P(E|E9) = 3 P(E\Es) = 

Hence, by Formula (6.1) with n = 3, 
P(E) = GG + (299 + G9GD = 3e 


m 6.1 it is only a short step to Bayes' formula. 


Theorem 6.2. Let (E E» °°" E,} be a partition of the sample 
space S, and suppose each of the events £i, Es, ++"; E, has nonzero 
probability. Let E be any event for which P(E) > 0. Then for each 
integer k (1 < k € n), we have Bayes’ formula: 

P(E)PQUE) _. 

z P(E)PQAE2 
j= 
finition of conditional probability twice, we 


P(E (YES _ P(E)P(GE|E9, 


P(E;|E) = P(E) P(E) 
ewriting P(E) according to Formula (6.1). 


From Theore 


(6.8) PLE) = 


Proof. Applying the de 
find 


The theorem is proved by 7 


ple illustrates the use of Bayes’ formula. 


The following exam 
Example 6.3. Suppose that the reliability of a chest X-ray test for 
the detection of tuberculosis is specified as follows: of people with 
tuberculosis, 90% of the X-ray examinations detect the disease but 
10% go undetected. Of people free of tuberculosis, 99% of the X rays 
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are judged free of the disease, but 1% are diagnosed as showing tu- 
berculosis. From a large population of which only 0.1% have tuber- 
culosis, one person is selected at random, given a chest X ray, and 
the radiologist reports the presence of tuberculosis. What is the prob- 
ability that the person has tuberculosis? 

We let E; denote the event that the person selected has tubercu- 
losis, and E the event that the person's X ray is diagnosed as positive, 
i.e., as showing tuberculosis. We seek P(E,|E). Now (Ei, Ei) is a 
partition of the sample space of all people in the population. We are 
given the following probabilities: 


P(E) = .001, P(E{) = .999, P(E|E) = 9, P(E|E)) = .01. 
From Bayes’ formula, we find 
P(E)PQ|E 
CO PED = ggg) x POPUD 
E (.001)(.9) 
(.001)(.9) + (.999) (.01) 
Note that although the X-ray test is fairly reliable, we have found that 
only slightly more than 8% of those with positive X rays turn out 


to have tuberculosis. The results of such calculations must be taken 
into account when large-scale medical diagnostic tests are planned. 


= .083, approximately. 


We note here the terminology often used when Bayes’ formula is 
applied. The events E, E», +++, E, are called hypotheses, and they 
are assumed to be disjoint and exhaustive. The probability (£z) i$ 
called the a priori probability of hypothesis E. The conditional prob- 
ability P(2;|Z) is called the a posteriori probability of the hypothesis 
Er, given the observed event E. Thus, in Example 6.3, the events Fi 
(person has tuberculosis) and £; (person does not have tuberculosis) 
are the hypotheses. The a priori probability of a person having tu- 


.90 E (positive X-ray) .00090 
001 _— Exthas TB) — 

10 E'(negative X-ray) .00010 
01 E (positive X-ray} .00999 


1999 Ei(no TB) 


:99 E'(negative X-ray) .98901 
Figure 15 
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bereulosis is P(E) = .001. But the a posteriori probability of a 
person having tuberculosis, given that his X ray is positive, is 
P(E,\Z) = .083. 

In Figure 15, we have the tree diagram for Example 6.3 in which a 
person is first classified according to whether or not he has tubercu- 
losis and then according to whether his X ray is positive or negative. 
Probabilities given in the data of the example are written on the ap- 
propriate branches of the tree. To the right of each of the four pos- 
sible paths from left to right through the tree we have written the 
probability associated with that path. For example, 

P (has TB and X ray positive) — P (has TB)P(X ray positive|has TB) 
= (.001)(.9) = .0009 
is the probability for the topmost path through the tree in Figure 15. 

These path probabilities can also be recorded in tabular form, as 
in Table 15, where the entry in each cell is the probability of the 
intersection of the events given by the row and column in which the 
cell appears. If we add the entries in any row or column, then by 
(6.2), we obtain the probability of the event defining that row or 
column. These probabilities appear in the margins of the table. 


TABLE 15 
E (X ray positive) E' (X ray negative) 
PENE P(E N E’) P(E) 
Fa TB) por i .00010 001 
, 7 
EL NE P(E, E!) P(E4) 
Ei (no TB) ae “98901 999 
Pies MES cT RU cau n 
P(E) P(E’) Total 
.01089 .98911 1 


From the entries in Table 15, we can derive all possible conditional 


probabilities. In particular, 


P(E QE) .00090 _ = 
P(Ej|E) = ae = 01080 ^ 083, approximately, 


in agreement with (0.4). i 
Computing probabilities from the Table or by Bayes formula, we 
can construct the tree diagram in Figure 16 which differs from the 
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diagram in Figure 15 because the order of events has been reversed. 
In Figure 16, we think of a person first being classified according to 
whether his X ray is positive or negative and then according to 
whether or not he has tuberculosis. (Probabilities in Figure 16 are 


0826 Ei (has TB) .0009 


.0109 - E (positive X-ray) c" 
< .9174 — E; (no TB) .0100 


(has TB) .0001 


0001 .—E. 
.9891^- E (negative X-ray) al 
9999 ~~ E; (no TB)  .9890 


Figure 16 


rounded to four decimal place accuracy.) Using the language associ- 
ated with Bayes' formula, we have in Figure 15 the conditional prob- 
abilities of the possible observed events given the various hypotheses, 
whereas in Figure 16 we have conditional probabilities of the possible 
hypotheses given the various observed events. 


We conclude with two more examples in which Bayes' formula 
proves useful. 


Example 6.4. Three urns contain colored balls as specified in Table 


TABLE 16 


Red White Blue 


16. One urn is chosen at random and a ball is withdrawn. It happens 
to be red. What is the probability that it eame from urn 2? 

We let E denote the event “ball selected is red." To account for 
the occurrence of E we have three hypotheses: E, (urn 1 selected), 
E» (urn 2 selected), E; (urn 3 selected). Since the urn is chosen & 
random, 

PE) = P(E) = P(E) = i. 
We also are given the conditional probabilities, 


P(E|E) = $, PQ|E)-1, P(EE) = 4. 
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Since (Ei, E», E;) is a partition of the sample space for this compound 
experiment, Bayes’ formula is applicable. Putting k = 2, n = i 
(6.3) we find Š — 
1 1 
POME GG .12 
G4D = gg xe + OW ~ 7 

. The reader may find it helpful to construct tree diagrams like those 
in Figure 15 and 16 for this example. 


Example 6.5. After a severe flood, a warehouse finds itself stocked 
with boxes of flashbulbs from which identification labels have been 
washed off. There are three kinds of bulbs, each packed in units of 
100 in identical boxes: low quality, medium quality, and high quality. 
It is known that in the entire warehouse, the proportions of boxes 
with low, medium, and high quality bulbs are .25, .25, and .50, 
respectively. 

Since testing a flashbulb means destroying it, exhaustive testing of 
the bulbs is impractical. Instead, the distributor orders that two 
bulbs from each box be tested. The manufacturer, on the basis of 
past experience, estimates the conditional probabilities given in 


Table 17. 
TABLE 17 


Probabilities of Finding x Defectives Given That 


Conditional 
m Box of Known Quality 


Two Bulbs Tested Were fro’ 


Quality of Box 


Number of Defectives 
Low Medium High 


0 49 64 81 
42 32 18 
04 .01 


om a box, tested, and both are 
he probability that the box con- 
theses are the three events L, M, 


ulbs are selected fr 
found to fire satisfactorily. What is t 


Suppose two b 


tains high quality bulbs? Our hypo 
H that the box contains low, medium, and high quality bulbs, re- 


spectively. If we let Æ denote the observed event that neither of the 
two bulbs tested was defective, then by Bayes’ formula we find 
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P(H)P(E|H) 
pia P(L)P(E|L) + P(M)P(E|M) + POPE 


_ (.50)(.81) " — 
. (25)(49) + (25) (64) + (50) 681) ^ 9» approximately 


Proceeding in this way, we compute the a posteriori probabilities of 
the three hypotheses, and so obtain Table 18. 


TABLE 18 


A Posteriori Probability Given That We Observe 
Quality A Priori 


of Box Probability 0 Defectives 1 Defective 2 Defectives 


Low 25 48 .38 -60 
Medium 25 23) 29 27 


High .50 .59 .33 -13 


We see that the most probable hypothesis in the event neither of the 
two bulbs tested is defective is that the bulbs come from a high 
quality box. But if one or both of the two tested bulbs are defective, 
then the most probable hypothesis is that the bulbs come from a low 
quality box. " 

Calculations of this sort using Bayes’ formula are quite common In 
Statistical decision theory. The reader interested in further details 
can consult References 2 or 10 listed at the end of this chapter. 


PROBLEMS 


6.1. From a group of four boys and two girls, first one child is selected at 
random and then, from the remaining five children, another child is 
selected at random. Find the probability that the second child selected 
will be a girl (a) from first principles, i.e., by defining a suitable sample 
Space, assigning probabilities to its simple events, ete. (b) by use 9 
Theorem 6.1. 

6.2. Refer to Example 5.5 and construct a tree diagram in which the selected 
ball is identified first by its color and then by the urn from which 1 
was drawn. Find probabilities associated with each branch of the tree, 
as well as for each path through the tree. Compare with Figure 12. 

6.3. Refer to Example 6.1 of the text. Construct a tree diagram in which F 
fraternity member is first classified according to whether he is enrolle 
in à mathematics course and then according to his class. Find prob- 


6.5. 


6.6. 


6.4. 


. Mr. Smith, having lived in his city many 


Sec. 6 / BAYES' FORMULA 99 


abilities associated with each branch of the tree as well as for each path 

through the tree. 

In the Polya urn model of Problem 5.10, find 

(a) The probability that the first ball is green. 

(b) The probability that the second ball is green. 

(c) The probability that the third ball is green. [In view of the answers 
in (a), (b), and (c), are you willing to make a conjecture? Try 
proving it!] 

This problem should be done from first principles and also by using 

Bayes’ formula. Three identical boxes each contain two coins. In one 

box both are pennies, in one both are nickels, and in the third there is 

one penny and one nickel. A man chooses a box at random and takes 
out a coin. If the coin is a penny, what is the probability that the other 
coin in the box is also a penny? 

(a) Bolts are made by two machines A and B, but A produces twice as 
many bolts as B in a given time. A is known to produce two percent 
defectives and B one percent defectives. A bolt is examined and 
found to be defective. What are the probabilities a priori and a 
posteriori that the bolt was produced by A? 

(b) Suppose nı to n» is the ratio of the number of bolts produced by A 
to the number produced by B. Let pı and p: denote the proportion 
of defectives produced by A and B, respectively. Suppose a bolt 


ted and found to be defective. Show that if i. > nope, then 
y that the bolt was produced by A is 
bability that the bolt was produced 


is tes 
the a posteriori probabilit 
greater than the a posteriori pro 


by B. 
years, estimates the a priori 


probability that today’s weather will be inclement is .2. (He thinks 
8.) Mr. Smith listens to an early 


today will be fair with probability 3 
morning weather forecast to get some information on the day’s weather. 


The forecaster makes one of three predictions: fair weather, inclement 
weather, uncertain weather. Mr. Smith has made estimates of condi- 
tional probabilities of the different predictions given the day's weather, 
as shown in Table 19. For example, he believes that of the fair days 


TABLE 19 


Forecast 


mS ORE 


Fair Inclement Uncertain 


Day's Weather 


Fair a 
Inclement 


100 


6.8. 


6.9. 


6.10. 


6.11. 
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70% are correctly forecast, 20% are forecast as inclement and 10% 
as uncertain. , 

Suppose Mr. Smith hears the forecaster predict fair weather. What is 
the a posteriori probability of fair weather? 


In a T-maze, a laboratory animal is given a choice of going to the left 
and getting food or going to the right and receiving a mild electric 
shock. Before any conditioning (in trial number 1) animals are equally 
likely to go to the left or right in the maze. Having received food on any 
trial, the probabilities of going to the left and right become .6 and A, 
respectively, on the following trial. Having received the electric shock 
on any trial, the probabilities of going to the left and right on the next 
trial are .8 and .2, respectively. 


(a) What is the probability that the animal will turn left on trial 
number 2? On trial number 3? " 

(b) Let us denote by p, the probability that the animal will turn left 
on trial number n. Derive an equation relating p, and pn-ı and 
use this equation to find a general formula for p, in terms of m 
and n. 


Refer to Problem 5.11 and suppose the selected socks are of the same 
color. What is the probability that they are black? 


A multiple-choice test question lists five alternative answers, of which 
just one is correct. If a student has done his homework, then he }5 
certain to identify the correct answer; otherwise, he chooses an answer 
at random. Let p denote the probability of the event Æ that a student 


does his homework, and let F be the event that he answers the question 
correctly. 


(a) Find a formula for P(E\F) in terms of p. 

(b) Show that P(E|F) > P(E) for all values of p. When docs the 
equality hold? à 

(c) Suppose the test lists n alternative answers of which only one 15 
correct. Now find P(E|F) in terms of n and p, and show that if p 5 


fixed but unequal to 0 or 1, then P(E\F) increases as n increases: 
Is this result reasonable? 


Of the freshmen in a certain college, it is known that 40% attended 
private secondary schools and 60% attended publie schools. The regi5 
trar reports that 30% of all students who attended private secondary 
schools but only 20% of those who attended publie schools attain À 
averages in their freshman year. At the end of the year, one student 1 
chosen at random from the freshman class and he has an A average 


What is the conditional probability that the student attended public 
schools? 
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6.12. 


6.13. 


6.14. 


You know that urn A contains two green and one red ball and urn B 
contains three green and two red balls. One of these urns is selected 
at random, but you don't know which one is selected. You may perform 
one of the following experiments before guessing which urn was selected. 


(i) Take one ball out of the selected urn and observe its color. 

(ii) Take two balls out of the selected urn, replacing the first before 
drawing the second, and observe their colors. 

(iii) Same as (ii), except that you do not replace the first ball before 
drawing the second. 

you choose, as soon as its outcome is known you 

ri probabilities of urns A and B being selected, 

the urn whose a posteriori 


Whichever experiment 


compute the a posterio 
given the observed outcome. You then guess 


probability is larger. 
(a) For each of the three experiments, 


for each possible experimental outcome. 
(b) For each of the three experiments, calculate the probability that 


you actually guess correctly which urn was selected. Which experi- 
ment leads to the highest probability of guessing correctly? [It is 
interesting to observe that most people, when offered a choice of 
one of the three experiments, prefer experiment (iii).] 


Let (Ei, Es, +++, En} be a partition of a sample space S and let E be 


any subset of S. Show that 
{EO Ey EO Es ee BX) En} 


determine which urn you guess 


is a partition of E. 
Let a universal set AL of people be given. Let M, F, C, A, H, and 8 
denote the subsets of male, female, child, adult, healthy, and sick people 
respectively. Then we can form the following partitions of U: 

P, = M,F} P: = {C, A} P, = {H, S). 
nd P; we can form a new partition, namely 


p,- MAGMOAFAGEONAS, 


From P; a 


e are classified both aecording to sex and age. P, is 
P, and P» Analogously we can classify 
]th in addition to sex and age. We thus 


in which peopl 
called the cross-partition of 
people according to their hea 
are led to the partition 
P: = ga GOMA E FACABFOAANE, 
MACASMOANS FACS FAANS, 


which is called the cross-partition of P, and Pi. 
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From these illustrative examples, formulate a reasonable definition 
for the cross-partilion of any two partitions of an arbitrary set E. Can 
you prove that a cross-partition of E is a partition of E? 


7. Independent events 


As we have seen, the probability of Z and the conditional proba- 
bility of E given F are generally unequal, although they can be equal. 
The case of equality, 

(7.1) P(E\F) = P(E), . 

is especially important, for (7.1) expresses the fact that, knowing E 
has occurred does not change the probability of E having occurred. 
If (7.1) holds, we shall say that E is independent of F. Let us note 
that this relation between the events E and F is defined only if P 
has positive probability, i.e., only if P(E|F) is meaningful. 

Assuming P(E) > 0 and P(F) > 0, we rewrite (5.4) here, 


(7.2) P(E N F) = P(E)P(F|E) = P(F)P(E\F), 
and from the equality on the right deduce that if (7.1) is true, then 
(1.3) P(F|E) = P(F) 


is also true. We have thereby proved the following result. 


Theorem 7.1. Let E and F be events with positive probability- If 
E is independent of F, then F is independent of E. 


In words, if knowledge that 7? occurs does not change the proba- 
bility of E, then knowledge that E occurs does not change the proba- 
bility of F. Thus, if P(E) > 0 and P(F) > 0 so that the conditional 
probabilities in (7.1) and (7.3) are defined, then these equations must 
both be true or both be false. When either is true, we find from (7.2) 
that 
(7.4) "d P(E (YF) = P(E)P(P). 


An important definition is based on equation (7.4). 


Definition 7.1. Two events E and F are said to be independent 
events if and only if Equation (7.4) holds; i.e., the probability that 
both Æ and F occur is the product of the probability that Æ occur? 
and the probability that F occurs. Two events that are not inde- 
pendent are said to be dependent events. We shall refer to Equatio? 
(7.4) as the multiplication rule for the events EZ and F. 


In the literature, one often finds two independent events referred 
to as "mutually independent,” “stochastically independent,” "'inde- 
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pendent in the sense of probability," or “statistically independent.” 
We shall use the simpler language of Definition 7.1. 

There is a reason for using Equation (7.4), rather than (7.1) or 
(7.3), as a means of defining independent events. With the latter 
equations, events Æ and F with zero probability would be excluded 
from our definition. No such restriction is involved when Equation 
(7.4) is used. In fact, it is easy to see from Definition 7.1 that if 
P(E) = 0 and F is any event, then E and F are independent. For 
E N F isa subset of Æ, and so P(E N F) € P(E) by Theorem 4.2. 
Since we are assuming P(E) = 0, it follows that P(E N F) = 0. 
Thus Equation (7.4) holds, and this proves that Æ and F are inde- 


pendent events, as claimed. A J p 
Whether or not two events E and F are independent is a question 


that we can answer in our present state of knowledge only by showing 
that Equation (7.4) does or does not hold. Although we will often 
have the intuitive feeling that two specified events E and F are inde- 
pendent, our intuition must be checked by computing P(E), P(F), 


and P(E (^ F), and then verifying that the multiplication rule (7.4) 
is true. We give three examples. 

Example 7.1. A green and red die are rolled. Let E be the event 
“six on green die" and F the event "five on red die." We choose the 
familiar sample space containing 36 outcomes and assign probability 


ais to each simple event. Then 
P()-4-4 PO = =o PENT) = ss 
Equation (7.4) holds and, therefore, E and F are independent events. 
Example 7.2. Two fair coins are tossed. Let E be the event “not 


fat t one of each face.” 
more than one head" and P the event “at leas 
Define the sample space S = {HH, HT, TH, TT} and assign proba- 


bility 1 to each simple event. Then 
P(E) =3, PU) = i 
PG OP) PEPE) vy 


and P(ENF) =% 
Hence 


and so Æ and F are dependent events. . 
: d. Let E and F be events 


Example 7.3. Three fair coins are tosse 
as reci in Example 7.2. Now the sample space 5 contains the 
familiar eight elements HHH, HHT, +5 TTT. Weassign probability 


3 to each simple event of S. Then 
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P(E) =4, P(F)=8, P(ECF)-P((HTT, THT, TTH}) = 3. 


Now the multiplication rule (7.4) holds, and E and F are independent 
events. 


It is not unusual for students to feel that the events Æ and F 
should either be independent in both Example 7.2 and 7.3, or de- 
pendent in both. But our intuition is not to be trusted—Equation 
(7.4) must be relied upon to determine if two events are independent 
or dependent. [See Problem 7.3.] 

If the probability of E is unchanged by the knowledge that PF 
occurred, then it seems reasonable that it should also be unchanged 
by the knowledge that F did not occur. This and related observa- 
tions are made precise in the following result. 


Theorem 7.2. Let E and F be independent events. Then the me! 
lowing pairs of events are also independent: (i) E and F’, (ii) Z’ an 
F, (iii) E' and F’. 

Proof. We prove (i) and leave (ii) and (iii) for the reader. (See 
Problem 7.4.) In view of Definition 7.1, to prove E and F” are inde- 
pendent events, we must prove that the multiplication rule (7.4) 
holds for E and F’; i.e., 

(7.5) P(E N F’) = P(E)P(P)). 
Now, by the result of Problem 4.13(e), 
P(E (Y P) = P(E) — P(E A F) = P(E) — P(E)P(P), 
since the multiplication rule holds for Æ and F by hypothesis. Hence, 
P(E (YF = P(E) — P(F)] = P(E)P(F), 


by Theorem 4.6. Thus, Equation (7.5) holds and the proof of (i) i 
complete. 


Example 7.4. Suppose A has probability p4 of surviving one Le 
and B has probability ps of Surviving one year. If we assume tha 
event E (A survives one year) and event F (B survives one year) are 


independent, then we have the following possibilities and the 
probabilities: 


Event Verbal Description Probability 
ENF Both A and B survive 1 year DADB 

EnF A survives 1 year but B does not pa(l — p») 
Enr A does not survive 1 year but B does (1 — pa)ps 
Enr 


Neither A nor B survives 1 year Q — pa) — p») 
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PROBLEMS 


7.1. A card is drawn at random from a standard deck of 52 cards. Let E be 
the event that the card is a spade, F the event that the card is a deuce 
and G the event that the card is a deuce or a trey. Determine which 
of the following pairs of events are independent. 


(a) E and F (b) E and G (c) F and G. 

1.2. Refer to the mortality table in Problem 5.13. Mr. Smith is now aged 21 
and Mr. Jones is now aged 23. Each man was issued insurance one 
year ago. Assuming that the events "Smith survives to age 22" and 
“Jones survives to age 24” are independent, calculate the probability 
that at least one of the men dies within one year. 

7.3. (a) Four coins are tossed. Let E and F be the events described in 

Example 7.2. Show that E and F are dependent. 
(b) Let » coins be tossed, where n is any positive integer greater than 1. 
Let E and F be the events described in Example 7.2. Show that Æ 


and F are independent events if and only if n = 3. 
7.4. Complete the proof oi Theorem 7.2 by proving (ii) and (iii). 
7.5. Consider the data in Table 20 on the smoking habits of a sample of 
females in the United States. 


TABLE 20 


'emales 18-24 Years of Age, by Current Amount of 


Distribution of Ft 
Smoking and by Income, February 1955 


Percentage Distribution 


Not Regular Smokers Regular Smokers i 


Tacome 7 
P af na Nonsmokers | Occasional 
ersons |____}_ Smokers 


ee 
% E % % 

None 3335 64.1 25 44 100 
Under $1000 1077 65.1 2.9 6.2 100 
31000-1999 inz 645 30 4.0 100 
$2000-2999 050 593 08 11.6 100 
$3000- 375 40.5 65 13.1 100 

Total: 7460 62.6 2.6 6.0 13.4 14.3 1.1 100 


Source: Tobacco Smoking in the United States in Relation to Income, Marketing Research 
Report No. 189, U.S. Dept. of Agriculture, Washington D. C., July 1957, page 110. 
I 
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7.6. 


T.T. 
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Suppose that a single female is selected at random from the 7460 
females making up this sample. 


(a) Define the sample space S for this experiment. What probability is 
to be assigned to each simple event of S? 

(b) Find the probability that the female selected has an income of at 
least $3000. 

(c) Find the probability that the female selected smokes between 10 
and 20 cigarettes per day on the average. 

(d) Find the conditional probability that the female selected smokes 
between 10 and 20 cigarettes per day on the average, given that 
she has an income of at least $3000. 

(e) Find the probability that the female selected has an income of at 
least $3000 and also smokes between 10 and 20 cigarettes per day 
on the average. 3 

(f) Are the events “female selected has an income of at least $3000 
and “female selected smokes between 10 and 20 cigarettes daily o 
the average” dependent or independent events? 


One student is selected at random from the summer school students a 
Problem 5.6. Are the events “student failed Chemistry” and “studen 
failed History” independent or dependent? 


From a pack of playing cards, two cards are drawn successively, the 
first being replaced before the second is drawn. Let Æ be the event 
"first card is a spade,” F the event “second card is not a king,” and 

the event “first card is an ace or a king." Determine which (if any) of 
the three pairs of events E and F, F and G, E and G are independent. 


- Repeat Problem 7.7, but now assume that the first card is not replaced 


before the second is drawn. 


. Show that if E is any event and P(F) = 1, then E and F are inde- 


pendent. 


Of what events E can it be said that the events Z and Æ are inde- 
pendent? 


* Let E, F, and G be three events. We are told that E and F are in- 


dependent events, and that F and G are independent events. Does it 
follow that E and G are independent events? Defend your answel 


+ Of the three events E, F, and G, we know that E and F are independent 


and G C. E. Does it follow that G and F are independent? Defend 
your answer, 


- The 1957-1958 Combined Membership List of the American Mathe- 


matical Society (S), the Mathematical Association of America (A); 2” 
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the Society for Industrial and Applied Mathematics (I) gives the fol- 
lowing information for the 46 members listed on page 1. 


Memberships Number 
S only 16 
A only 15 
I only 7 = 
Sand A 6 
AandI 1 
S and I 0 
S and A and I 1 


One person is selected at random from this group of 46 people. 


“person belongs to the American Mathe- 


(a) Show that the events 
“person belongs to the Mathematical Asso- 


matical Society” and 
ciation of America” are dependent. 

(b) Assuming everyone else maintains their memberships, how many 
of the 16 members of only the American Mathematical Society 
must also become members of the Mathematical Association of 
‘America in order that the events in (a) be independent? 


744. Two partitions of S, say {En Es +++, En) and (Fi Fo, +++, Fm} are 
defined to be independent if 

P(E Fj) = P(E)P (Fi) 
foi-21,2,::5" andj = L2,» i.e., if the multiplication rule 
(7.4) holds for every pair of events formed by taking one event from 
each partition. Sh ts E and F are independent if and 


ow that the even 
only if the partitions {E, E") and (F, F'} are independent. 
8. Independence of several events 

In this section, We generalize the notion of independence to an 
arbitrary (but finite) number of events. Let us first consider the 
special case of three events Ey, Es, and Es. 

Definition 8.1. The events En Es, and Es are pairwise independent 
(or independent in pairs) if all of the possible pairs of events (i.e., 
E, and Es, E; and Es, Es and E) are independent. 

Thus, if Ej, E», and Es are pairwise independent, then the multi- 
plication rule (7.4) holds for each pair of events: 
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PA N E?) = P(E,)P(E2) 
(8.1) | P(E, N E) = P(B,)P(E;) 
P(E: N Es) = P(E2)P(B). 


Let the reader note that we have defined what we mean by the 
pairwise independence of three events, but we have not yet said what 
is meant by the phrase, *E,, E» and E; are independent events. 
Nevertheless, we do think of reasonable consequences that such à 
definition should entail. For example, we would like to be able to 
show that if 7, E», and E; are independent events, then the two 
events (E; N E;) and E; are also independent. Does this ane 
follow if we assume only that E; E», and Es are pairwise independ- 
ent? This question amounts to asking if the equations in (8.1) imply 
that the multiplication rule holds for (E, N Æ) and Bs, i.e., whether 
or not 


(8.2) P(E: N E) N Es) = P(E, N E2)P(E;) 

follows from (8.1). g 
But (Fi E) (YE = By (Y E, A) E; and if we use (8.1) to sim- 

plify P(E, N E»), then (8.2) becomes 

(8.3) P(E NA ESO Ey) = P(E)PQG)PQ). 


We are thus led to inquire whether Equations (8.1) imply Dounon 
(8.3). That the assumption of pairwise independence of E, E», an 


Example 8.1. To control the quality of a manufacturing process, 
each unit produced passes through three inspections. Of four in 
A, B, C, and D it is known that A passed only inspection 1, B passec 


s , 
1= “unit passed inspection 1," E, = “unit passed inspection 2,’ 
and F; = “unit passed inspection 3.” Then 
P(B) = P(E) = P(E) = 3 = 3, 
P(E: N E) = P(E: N By) = P(Es N E) = 4, 
so that all three equations in (8.1) hold. Thus the events Ei, Es; and 
E; are pairwise independent. But (8.3) does not hold, since 


POS OV Ev C E) = 1 o PE)PQR)PQE,) = à. 
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We conclude that the pairwise independence of E;, E», and E; does 
not imply the independence of (i N E») and Es. 


From this example (see also Problems 8.1-8.3), it is clear that the 
definition of independence for more than two events requires care. 
For three events, a suitable definition is obtained by demanding that 
Equation (8.3) hold in addition to the three equations in (8.1). We 
shall find it convenient to refer to Equation (8.3) as the multiplica- 
tion rule for the events E;, E», and Es. 

Definition 8.2. Three events Ei, E» and E; are said to be inde- 
pendent if and only if the multiplication rule holds for all combina- 
tions of two or more of the events. 

The three equations in (8.1) express the multiplication rule for the 
three pairs of events obtainable from E, E», and E;. Equation (8.3) 
is the multiplication rule for all three events. To say that E; E», 
and E; are independent is to say that all four of these equations are 


true. 
It is now possible to prove that certain expected consequences do 


indeed follow from Definition 8.2. 

Theorem 8.1. Let Ei, E», and E; be independent events. Then the 
following events are also independent: 

(a) E; and (Es N Es) (b) E» and (E, U E) 

(c) Ei and (E: N £3) (d) Ei, E», and Es 


E, and any event expressible in terms of E; and E; 


More ll 
ent d sible in terms of E; and E; 


are independent, E» and any event expres 
are independent, etc. 
Proof. We prove (c) here, and leave (a), (b), and (d) for the reader. 
The general result can be proved by considering these and all other 
similar combinations of the three events E, Es, and Es. 
To prove (c) requires (by Definition 7.1) that we prove 
(8.4) P(E; A (E: N E9) = P(Ei)P(E: N E3). 


We are given that En, E», and E; are independent, so that Equations 


(8.1) and (8.3) are true by hypothesis. - . 
By drawing an appropriate Venn diagram and resorting to the 


fundamental definition of the probability of an event, let the reader 
verify that 
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85) P(E, (Y (E:N E9) = P(E: N Es) — P(E N E) 
Pm : + P(E, (Y Es N E). 
(Using the “numbers of flags” language introduced in Section 4, the 
proof of (8.5) amounts to noting that one obtains the sum of the 
numbers on all flags in Ei N E: N £3, each counted once, by gee 
the sum of the numbers on all flags in E» N E$ and in £i N E: fF i 
and then subtracting the numbers that have been added twice, their 
sum being P(E, (^) E:).) 
By using Equations (8.1) and (8.3), we find 
(86)  P(Ei A (E: (ES) ; 
= P(E; N El) — P(Z)P(E9) + P(E) P(E: PEs) 
= P(E: N E$) — P(E)PQ2)(1 — PQ?2] 
= P(E: N E$) — P(E:)P(E:)P (E3). 
But since E» and E; are independent by hypothesis, it follows "n 
Theorem 7.2 that E» and Eż are also independent, and so the mu A 
plication rule holds for E» and E$. Hence, continuing from (8.6); 


P(E CY (E: (V E9) = P(E: N E$) — P(E:) P(E: N Es) 
= [1 — P(E) ]P(E: N Ex) 
= P(E)PQ N E3). 
This completes the proof of part (c) of Theorem 8.1. 


The following example shows how this theorem is applied. 


Example 8.2. One shot is fired from each of three guns. Let Ei e 
E; denote the events that the target is hit by the first, second a? 
third gun, respectively. Suppose 


P(E) = 0.5, P(E:) = 0.6, P(Ej = 0.8. 


Assuming E, E», E; are independent events, what is the probability 
that exactly one hit is registered? 


Since the one hit can be made by any gun, the required probability 
is given by 


P(E, C) E& C ES) + PUBL C) Es A ES) + PELA ENA E). 
By the independence assumption and Theorem 8.1, each of these 
probabilities can easily be evaluated. For example, 
PEL N Ez N E3) = P(E:)P(E2)P(E3) 
= (0.5)(0.4)(0.2) = 0.04. 
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In this way, we find the probability of exactly one hit is 0.26. (See 
Problem 8.5.) 


Definition 8.2 has been so formulated that it can be used as a 
definition of independence for any finite number of events. 


Definition 8.3. The n events Zi, E» ***; E, (n > 2) are said to be 
independent if and only if the multiplication rule holds for all combi- 
nations of two or more of the events, i.e., if and only if we have 


P(E; N E) = PEPE) 
a<i<j<n) 
P(E: 0 E; Ex) = P(B)P(E;)P(Ex) 
1<i<j<k<n) 


(8.7) | P; E; ENE) = P(B)P(E)P (Ex) PB») 
a<i<j<k<l<n 


P(A 1 EO: (YE) = P(E1)P(E2) °° P(E,)- 
conditions must be checked if n events are to 
t? Let us think of the set q containing as ele- 
ments the n events Fi, E», * * '; E,. The set AL has exactly 2” subsets. 
The multiplication rule is required to hold for all subsets of AL con- 
taining at least two events. There is one null subset and there are n 
unit subsets for which multiplication rules are not required. Hence, 
there are 2" — n — 1 equations summarized in (8.7)- 

Let us observe that Definition 8.3 implies that if n events are inde- 
pendent, then any smaller number of events taken from these n are 
also independent. 

In our later work, we shall find many applications of the important 
idea of independence of events. Most of these involve “independent 
trials of an experiment” or “experiments repeated independently 
under identical conditions.” In the next section, we present a mathe- 
matical formulation of these important concepts. 


How many defining 
be proved independen 


PROBLEMS 


8.1. A green and a red die are rolled. Let E, = “6 on red die,” E, = “6 on 
green die,” and E; = "sum of numbers on two dice is odd.” Show that 


FE, E», and E; are pairwise independent but are not independent. 


8.3. 


8.4. 


8.5. 


8.6. 


8.7. 


8.8. 
8.9. 


8.10. 
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. A card is selected at random from a standard deck. Let E, = “card is 


a spade or a club," E; = “card is a spade,” and E; = “card is ace, 
king, ---, 8 of diamonds or the ace of spades.” Show that Equation 
(8.3) holds, but that none of the three equations in (8.1) is true. 
Suppose of three events Zi, E», and E; it is known that Fi and E: are 
independent and that the multiplication rule given in Equation " 
applies to all three events. Prove that E, Q E» and E; are independent, 
but show by example that Æ, and F; need not be independent. 
A coin is tossed three times in succession. We define the customary 
sample space 

S = {HHH, HHT, ---, TTT}, 


and let Ei, E», and E; denote the events that a head is tossed on the 
first, second, and third toss, respectively. Suppose we require 


PE) = P(E:) = P(Es) = p, 
where 0 < p <1, and also that Ei, E», and Es are independent. a 
that there is one and only one acceptable assignment of probabi Ó 
to the simple events of S that is consistent with these assumptions. 
: t 
Refer to Example 8.2 and find the most probable number of times tha 
the target is hit. 
Let all pairs of events from I5, E», and E; be mutually exclusive. 
(a) Are Ei, E», and E; pairwise independent? 


S Ma ^ ] ot 
(b) What additional hypotheses are required to make ‘“‘yes” the correc 
answer in (a)? 


(c) With the hypotheses added in (D), are the events Zi, E», and E 
independent? 


Let F, and E; be independent events and suppose F; has probability 
zero or one. Show that E, E», and Es are independent events. 
Complete the proof of Theorem 8.1 by proving (a), (b), and e 
Suppose the events E, E», +++, En are independent and that PG = 
1/(k + 1) for 1 <k X n. Find the probability that none of the 7 
events occurs, justifying each step in your calculation. 

Let p be the probability that a man aged z will die ina year. There r 
four men (A, B, C, and D) each aged z. We assume that events rt 
E», Es, E, are independent whenever E, is defined in terms of only 7 
life, E» in terms of only B's life, etc. Find the probability that 

(a) A will die within the year. 


(b) A and B will die but C and D will not die in the year. 
(c) Only A will die within the year. 
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8.11. The president of a company must decide which of two actions to take 
say whether to rent or buy expensive machinery. His vice-president te 
likely to make a faulty analysis and thus recommend the wrong decision 
with probability .05. The president hires two consultants who sepa- 
rately study the problem and make their recommendations. After 
watching them at work, the president estimates that one consultant 
is likely to recommend the wrong decision with probability .05, the 
other with probability .10. He decides to take the action recommended 
by a majority of the three reports he receives. What is the probability 
that he will make a wrong decision? Does the assumption of inde- 
pendence you have made seem reasonable for this problem? 


9. Independent trials 


The notion of “experiments repeated independently under identi- 
cal conditions” is central to empirical science and, as such, is worthy 
of precise formulation. This we do in the present section. Since we 
shall make extensive use of Cartesian product sets, the reader may 
find it helpful to review Section 5 of Chapter 1 at this time. 

Suppose an experiment is under consideration. As we know, we 
think instead of its mathematical counterpart, the sample space S, 
Where 
(9.1) S = (0,05 **, Ont. 

We assume that an acceptable assignment of probabilities has been 
made to the simple events of S; i.e., to each {o;} there is assigned a 
nonnegative number P({0;}) in such a way that 


n 
(9.2) 2 Pop = 1. 
ja 
g this experiment and then perform- 


Now let us think of performin 
ments is a new experiment 


ing it again. The succession of two experi 
that we want to describe mathematically. In order to avoid confusing 


references to original experiments and this new experiment, it is con- 
venient to refer to the original experiments as trials and to describe 
the new experiment as made up of two trials, each represented by 
(or corresponding to) the sample space S. This new experiment is 
mathematically defined, as are all experiments, by a sample space. 
The elements (outcomes) of this new sample space are all the ordered 
pairs (0;, o;) denoting the occurrence of outcome 0; at the first trial 
and outcome o; at the second trial. Thus the sample space for the 
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experiment is the Cartesian product set S X S. Since the sample 
space S for each of the two trials making up the experiment has 7t 
elements, there are n? ordered pairs in S X S. ] 

Before probability questions can be answered for the experiment, 
we must make some acceptable assignment of probabilities to the n“ 
simple events of S X S; i.e., we must assign a nonnegative number to 
{(0;, 0:)) for each j and k in such a way that the sum of all n° umbo 
is 1. As we know, there are infinitely many ways of doing this. Bu 
if we say that the two trials are independent, then by definition there t8 
one and only one way that we must use: the assignment must be made s0 
that 


(9.3) P(((o; 0:)}) = Po} PHY) 
forj —1,2,---,n and k = 1,2, ---,m. t 
Formula (9.3) expresses the probability of the simple Me 
{(0;, 04)) of S X S as the product of the probabilities of the E 
events (oj; and (o;j of S. Before discussing the significance of e 
rule, we first demonstrate that (9.3) provides an acceptable assignmen 
of probabilities to the simple events of S X S. S 
The number P(1(oj, o;))) is certainly nonnegative, since it 18 
produet of two nonnegative numbers. Now to find the sum of the 
probabilities of all simple events of S x S, we first write them nm 
rows and columns as follows: 


P((050))) P((o0))) +++  P(((0 02) 
P(((o», 01)}) P({(02, 02)}) tai P({(02, 0n)}) 


P((o,0))) P((o,0))) +++ PAO o): 


The sum of the probabilities in the first column is 
A, P GG; 0))) = Z P(o3)P((o3)), by (9.3), 
= i= 


= P(fo3) 2 P3) 


= P((o3), by (9.2). 


m : H 
Similarly, the sum of the probabilities in the kth column is r e 
for k = 1, 2, -.., n. The sum of the probabilities of all n? Sim 
events is the sum of the column totals, 
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n 
E P({o}) = 1, 
k=1 


and the assignment specified by (9.3) is acceptable, as claimed. 

We summarize in the following formal definition. 

Definition 9.1. Let S be a sample space with elements 01, 0», * * *, On 
and let P({o;}) be the probability of the simple event (oj for 
j=1, 2, =, n. By the experiment consisting of two independent 
trials corresponding to S, we mean the sample space S X S (the Car- 
tesian product set of S with itself) whose elements are the n? ordered. 
pairs (0;, 0;) and whose simple events {(0;, 0:)} are assigned proba- 
bilities in accordance with the product rule in Equation (9.3). 


is tossed once and then tossed again. 
Each toss is a trial represented by the sample space S = {H, T} 
whose two simple events are each assigned probability 3. The ex- 
periment made up of the two trials is defined by the sample space 


S X S, where 


Example 9.1. A fair coin 


sx s= (HH, HT, TH, TT). 

acceptable assignments of probabilities 
to the four simple events of S X S (see the discussion in Example 
3.7), but if the two tosses are said to be independent, then each simple 
event must be assigned probability 1, in accordance with Equation 
(9.3). We remind the reader of our agreement to write HH rather 
than (H, H), HT rather than (H, T), etc. It is customary to write 


ntheses with the two objects of the pair sepa- 


ordered pairs using pare 2 e 
rated by a comma, but when no confusion can arise we shall continue 


to use the less cumbersome notation. 


There are infinitely many 


Example 9.2. From & population of n people, one person is se- 


leeted at random. Another person is then selected at random from 


the full group; ie, We allow 
both trials. Each selection (trial 


the same person to be selected at 
) is defined by the sample space 
S = (0,2, --:, n), where each person is identified by a positive 
integer. Each of the n simple events of Sis assigned probability 1/n; 
ie, P({j}) = 1/n for j = 12,5" The experiment made up of 
these two trials is called selecting a sample of two with replacement 
from the population and is represented by the Cartesian product set 
S X S given by 

SxS = (G,D eS be. 
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To say that the two trials (i.e., the selection of the first person and 
the selection of the second person) are independent is to require the 
assignment of probabilities to the simple events of S X S in accord- 
ance with Equation (9.3). Since 

PUG, DD = PDPN = (1) 23s 
the independence of the trials means that each simple event of S X 5 
is assigned the same probability 1/n?. Thus we have the formal 
mathematical counterpart of our intuitive feeling that selecting 2 
random sample of size two with replacement can be considered as 2 
Succession of two independent selections. (Sce Problem 9.9.) 


Example 9.3. Consider the experiment consisting of two independ- 
ent rolls of a fair die. Since each roll corresponds to the sample space 
S = (1,2, ---, 6}, each simple event of which is assigned probability 
% Definition 9.1 demands that the experiment be defined by the 
familiar sample space 


SX S = {(1, 1), (1, 2), +++, (1, 6), +, (6, 1), (6, 2), «++, (6, ©) 
for which each simple event has probability 4. Let E, = “first roll 
results in a 6” and E; = “second roll results in an even number. 
We intuitively expect that the independence of the trials will have 


as a consequence that F, and E» are independent events. That this 
is the case is easy to verify, since 


P(E N E) = $4 P(E) = 3, and P(E) = #4, 


so that the multiplication rule 


P(E: N E) = P(E,)P(E2) 


does hold, and the events E; and E» are independent, as expected. 


We feel this result is reasonable because of the special nature of 
the events Æ, and E» Of the two independent rolls of the die, the 
first roll determines whether or not E, occurs, and the second T° 
determines whether or not Æ, occurs. More generally, given any v? 
independent trials, it seems reasonable to expect that, if the first tri? 
determines whether or not an event Z, occurs and the second trial de 
termines whether or not an event E» occurs, then K, and E? will be 
independent events. We want to prove this result is generally true 

But first we must define precisely what we mean when we say thot 
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the first trial (or the second trial) determines whether an event E 
occurs. Let us refer to Example 9.3 and note that, in the experiment 
consisting of two independent rolls of a fair die, we have 
E, = “first roll results in 6” 
((6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)} 
= {6} x {1, 2, 3, 4, 5, 6} 
= {6} XS. 
Similarly, 
E, = “second roll results in an even number” 
= {(1, 2), (2, 2), 3, 2), (4, 2, 6, 2), (6, 2), (1; 4), (2, 4), (3, 4), 
(4, 4), (5, 4), (5, 4), (1, 6), @ 6), G, 6), (4, 6), G, €), (6, ©} 
(1, 2, 3, 4, 5, 6} X (2,4, 6} 
S x {2, 4, 6}. 
Generally, our event Æ is 


nog 


of course a subset of S X S and, as such, 
is a set of ordered pairs. To say that E is determined by the first 
trial means that the first member of the ordered pair is restricted by 
the requirement that E occurs, but the second member is unre- 
stricted. Similarly, to say that E is determined by the second trial 
means that the first member of the ordered pair is unrestricted, but 
the second member is restricted by the condition that E occurs. 
We make the following formal definition. 

Definition 9.9. Consider an experiment consisting of two independ- 
ent trials, each trial defined by the sample space S. (The two-trial 
experiment therefore has as sample space the Cartesian product set 
S X S.) An event E (subset of S X S) is said to be determined by the 
Jirst trial if and only if there is some subset Ci of S such that 


E-2CXS. 
aid to be determined by the second trial if and 
Cs of S such that 
F=SXC2 


e to state and prove our main result. 
ing of two independ- 


Similarly, an event F is s 
only if there is some subset 


We are now abl 

Theorem 9.1. Consider the experiment consist! 
ent trials corresponding to the sample space 
S= {o1, 05, ***; On} 


Let E, and E; be any two events of S X S such that Æ, is determined 
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by the first trial and E; is determined by the second trial. Then Ih 
and E» are independent events. 


The following lemma is the essential result needed to prove 
Theorem 9.1. 


Lemma. Let S X S be the sample space defining an experiment 
consisting of two independent trials, cach trial corresponding to the 
sample space S. Let C, C S and C; G S. Then 


(9.4) P(C; X C2) = P(Ci)P(C2). 


Proof of Lemma. Let the reader first note that (9.4) reduces to 
(9.3) in the special case when C; and C^ are simple events of S. And, 
of course, (9.4) is also true if C, or C» is the null event. Now let us 
suppose that the subsets C, and C2 are given as follows: 

6 = (oj, 0j, t7, 0j), C. = (on; Oky 175 Or}. 


The event C; X C; is the union of all those simple events {(0; ou) 
of S X S for which (oj; € C; and (o;) € Cs. We arrange these events 
in r rows and s columns and write 


(05, 03)) U (095,019) U +++ U {Co 0nd} U 
COR UNTRA a 4 a 9 x ow x wnomom ws ee 
{Gn 01)) U {Gj 019) U +++ U {Qin 0:27- 
Now to compute P(C, X C) we must add the probabilities of all 


these simple events. If we add the probabilities of the simple pen 
in the first row, we find from the assumed independence of the tria Pr 


b P(((os, 0%,)}) = 2 P(1o,3)P((01) 


ll 


PC{o5}) È Pos) 


= P({o;})P(C2), ; 
the last equality following from the definition of P(C:) as the sum ° 
the probabilities of the simple events whose union is C». The Sum " 
the probabilities in any other row is obtained in the same WAY: y* 
sum for the uth row (u = 1, 2, «++, r) being P({o;,})P(C2). We ^ 


tain the sum of the probabilities of all simple events of Ci X *? by 
adding these row sums. Thus, 
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P(& X €) = E, Po J)P(C2 


P(C) Z Po) 
= P(C)P(C9, 


and the proof of the lemma is complete. 

Proof of Theorem 9.1. Our hypothesis concerning Æ, and E; in 
view of Definition 9.2, implies the existence of sets C, and C», each 
subsets of S, such that 

E-26XS, E: = S X C2." 


To prove the theorem, it suffices to prove that the multiplication 


rule holds for Z, and Es, i.e- 

(9.5) P(E, (E) = PQ)PQD. 

We first note (cf. Problem 1.5.5) that 

(9.6) BAR =GAXDNSX CH) -06x0 


Now apply the lemma three times to obtain 
(9.7) P( N E) = P(X C) = PCP), 

(9.8) P(E) = P( X 8) = P(COP( = PC), 

ne PU) = P(S X C = P(S)P(C2) = PCC, 

that P(S) = 1. Thus we see that the 
and so E, and E; are independent. 


complete. 


where we have used the fact 
multiplication rule (9.5) holds, 
The proof of Theorem 9.1 is now 

It is important to note the significance of our results. Referring 
back to the dice-rolling experiment in Example 9.3, we recall that the 
phrase “first roll results in 6” was interpreted as the description of 
the event EZ, = {6} X S, a subset of 8 x S. But this phrase also de- 
scribes an event, namely (6), which is a subset of S. In general, the 
event E, = (i X S and the event Cı, although events of different 


sample spaces, are both determined by the same first trial of the 
J vents should therefore have the 
same probability. Formulas (9.8) and (9.9) guarantee that such ex- 


pectations are realized. 4 
and E. satisfy the hypotheses of Theorem 


Note also that when Fi 
we can calculate PUA N E) as a product 


9.1, then, as shown by (9.7), 
of probabilities of events (subsets) of the sample space S, and we do 
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not have to do any computations relative to the sample space S X S 
of the two-trial experiment 

Our definitions and results can be 
of repetitions of the same experimen 


number of successive experiments w 
proofs. 


generalized to any finite number 
t or, still more generally, to any 
'hether like or unlike. We omit 


Definition 9.3. Suppose N is a positive int 
Je 2,6 N) bea sample space with oute 
By the experiment consisting of the Successio 
Corresponding to S,, the Second to S», ete., we mean the sample space 
S XS x... X Sy (the Cartesian product set of Si, S. +++, Sy) 
whose elements are all the nin; +++ ny ordered N-tuples (o, o®, 
t5, 0) where ow eS, o9 € Se, +++, 900 € Sy. For each sample 

ceptable assignment of probabilities to its 
S are independent is to define 
ents in S, X S, x ... X Sy by the 


eger and let S; (for 
omes of, of), +++, of. 
n of N trials, the first 


the probabilities of all simple ey 
product rule 


P(((ot5, oo», .. 


Theorem 9.9. Consider the experi 
ent successive tria 


€ ample spaces Si, So, +++, Sy, 
2 c aie events E, He, ... Ey be Such that E; is determined 

y the jth trial for j = 1, 9° .. N. [To say, for ex Ey is 
determined by the first trial meat Mio ste ere 
that 


ed by the see oT i 
subset C of Sanch tae Ond trial means that there is a 


Fh = 8. X Cr x S, x eX Sy: 

and so on.] Then the events By E wn +, Ey are independent, 

UN =2ands, = 5 S, i à 
ition 9.2 7 S then Definition 93 9.2 
reduce to Definition 9.1 and Theorem 9.1 n iii 
one 9.4. A fair coin is tossed, then a Symmetric die is rolled, 
edi zi Rm is selected àt random from a Standard deck, We 
oie m lese three trials are independent, Introducing obvious 
= ‘eb dies. that the event “head” ig determined by the first 
j umber" is determined by the Second trial, ang “spade” 
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E determined by the third trial. Applying Theorem 9.2, we conclude 
at “head 39 ££ 7i 71 € » 2 mis 
reali ," "even number, and “spade” are independent events. 


P (head, even number, spade) = P(head)P(even number) P (spade) 
= 0G = ds 


Example 9.5. A quiz has four questions of multiple-choice type. 
There are three possible answers for each question, but only one an- 
swer is right. Assuming a student guesses at random for his answer 
to each question and that his successive guesses are independent, 
what is the probability that he gets more right than wrong answers? 

The sample space for each trial (answering a question) is 
S = {R, W}, where R denotes a right answer, W a wrong answer. 
We are given that 

P(RD-&à P(WD-$i 
For the four-question test, the sample space is SXSXSXS. 
The event “3 or 4 right" is the subset 
(RRRR, RRRW, RRWR, RWRR, WRRR}. 


Because the trials are independent, 
P((RRRR)) = (5 P((RRRW)) = GG, 


and the probability of the other simple events for which exactly three 


answers are right is also (3)°(3). Hence 
P(3 or 4 right) = QD + 4Q»G) = 3 


PROBLEMS 


mes. Determine a suitable 


9.1. (a) A fair coin is tossed three independent ti 
nt and make the required 


sample space for this three-trial experime! 
assignment of probabilities to its simple events. 

(b) Repeat part (a), but now assume that the coin is constructed so 
that the probability of head on any toss is p (0 € p < 1) and the 
probability of tail isq = 1 — P- (Cf. Problem 8.4.) 

9.2. A random sample of five is selected with replacement from a population 
of which 40 percent are female and 60 percent are male. Define the 
sample space for this experiment and assign probabilities to its simple 
events. Find the probability that the sample contains 


(a) no males 
(c) exactly one male 


(b) at least one male 
(d) all males 


192 


9.3. 


9.4. 


9.5 


9.6. 


9.7. 


9.8. 


9.9. 
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A test has ten questions of multiple-choice type. There are six choices 
for each answer, but only one is correct. Suppose a student guesses his 
answer to each question. (For example, he can toss a fair die and let 
his answer be determined by the number that turns up.) Assuming his 
guesses are independent, define a sample space for this experiment 
and assign probabilities to its simple events. Find the probability that 
he gets nine or ten correct answers. 


At a busy street intersection, it is estimated that a jaywalker will be 
hit by a car with probability .01. Assuming individual trips form in- 
dependent trials, find the probability of a jaywalker remaining unhit 
if he crosses the street twice per day for 30 days. 


A football team wins its weekly game with probability .7, loses with 
probability .2, and ties with probability .1. Consider the games played 
on three consecutive weekends as a three-trial experiment in which the 
trials are independent. Find the probability that the number of wins 
exceeds the sum of the number of losses and ties. 


A baseball player approximates his chances at bat as follows: prob- 
ability .3 of getting a hit, .1 of getting a base on balls, and .6 of being 
out. Consider the four times the player is at bat in a game as four 
independent trials and compute the probability that he gets (a) one 
walk and three hits, (b) one walk, one hit, and is put out twice. 


Suppose a missile has probability 4 of destroying its target and prob- 
ability 4 of missing it. Assuming the missile firings form independent 
trials, determine the number of missiles that should be fired at a target 
in order to make the probability of destroying the target at lenst .99. 


(a) Each of two urns contains three identical balls numbered from 1 
to 3. One ball is drawn from each urn, and we assume these draw- 
ings are independent. What is the probability that 2 is the 
greatest number drawn? 

(b) Each of & urns contains n identical balls numbered from 1 to ”- 
One ball is drawn from each urn, and we assume these drawings are 


independent. What is the probability that m is the greatest number 
drawn? 


From a population of n people, one person is selected at random. A 
second person is then selected at random from the remaining (n — D 
people. Imagine the n people lined up in order in positions numbered 
1, 2,+++,n. The first trial (selecting the first person) amounts to select- 
ing one of the n positions, and so can be represented by the sample 
space Sa = (1,2, ---, n), in which each simple event is assigned prob- 
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ability 1/n. After this first person is selected, imagine the remaining 
(n — 1) people keeping their relative positions but closing ranks so 
that they are lined up in positions numbered 1, 2, «+4, n — 1. The 
second trial (selecting the second person) amounts to selecting one of 
these (n — 1) positions, and so can be represented by the sample space 
Sn = {1,2,-++,2— 1}, in which each simple event is given prob- 
ability 1/(n — 1). If these two trials are assumed to be independent, 
then the experiment consisting of the two independent trials is called 
selecting a random sample of two without replacement from the population. 
e we thus led to for the experiment of select- 


thout replacement from the population? 
d to cach simple event of this sample 


(a) What sample space an 
ing a sample of two wi 
What probability is assigne 
space? 

(b) Suppose n = 26 an 
Y, Z. You select a ran 
this population and re 
were selected? 

(c) Generalize our discussion an 
sample of N without replacement 
can be considered as à succession of 
(Of course, N < n.) 

(d) With n — 26 as in (b), suppose N 
replacement and the outcome repor 
Which four people were selected? 


d these 26 people are named A, B, C, +++, X, 
dom sample of two without replacement from 
port the outcome (2, 3). Which two people 


d show that the selection of a random. 
from a population of n people 
N independent selections. 


= 4 people were selected without 
ted as the 4-tuple (2, 3, 1, 22). 


9.10. We have an n-trial experiment, each trial of which corresponds to the 
sample space S. Show that the null event 9 and the entire sample space 
for the n-trial experiment are determined by every trial of the experi- 


ment. Does this seem reasonable? 


10. A probability model in genetics 

Probability concepts have come to play an increasingly important 
role, not only as the foundation of mathematical statistics, but also 
in formulating mathematical models for phenomena m all the sci- 
ences, biological, physical, and social. In one brief section, we can 
hardly hope to do more than illustrate the latter kind of application. 
We shall use the theory developed to this point, especially the ideas 
of conditional probability and independent trials, to consider (in 


greatly oversimplified form) some important questions arising in 
Population genetics and involving the factors influencing evolution. 
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Incidentally, we are also able to illustrate the use in probability of 
the method of difference equations.* 

We restrict our attention to a single gene which has only two forms: 
recessive (r) and dominant (D). We assume that each individual in 
the population under consideration has two such genes in his chro- 
mosomes and therefore can be classified as one of the following types: 
(1) pure dominant, DD, in which both genes are of dominant form; 
(2) hybrid, rD, in which one gene is recessive and the other dominant; 
(3) pure recessive, rr, in which both genes are recessive. (Biologists 
refer to these classes as genotypes; DD individuals are called homo- 
2ygous dominant, rr individuals are homozygous recessive, and rD indi- 
viduals are heterozygous.) 

The genetic make-up (with respect to this particular gene) of each 
generation is described by the proportions of individuals of this gen- 
eration in the three genotypes. If we think of drawing a sample of one 
individual at random from this generation, then the proportion of 
any genotype will be the probability of the individual being of that 
genotype. Thus we introduce symbols as follows: 


n = generation number (0, yZ +), 


Un = probability that individual selected from nth generation is DD, 
20, = probability that individual selected from nth generation is rD, 
Wr = 


probability that individual selected from nth generation is rr. 
It is clear that 


(10.1) Un + wn Hwn =l (n= 0, 1, 2, ==), 


The general problem of population genetics can be formulated in 
the following manner: Given the initial (n = 0) probabilities wo, 2v, 
Wo and a set of assumptions deseribing the dependence of future gen- 
erations on this initial one (i.e., assumptions concerning the mating 


System, gene mutations, forces of natural selection, ete.), find the 
genotype probabilities for n. I. 


* Additional material on probability 
ter 3 of the book by Neyman liste 
"Or à systematic exposition of the 
model in psychology, see R. R. 
Learning, John Wiley and Sons, 
Probability models appear in P. 
the Social Sciences, The Free Pr 
used in this section is expounded 
tions, John Wiley and Sons, 


methods in geneties can be found in Chap- 
d in the references at the end of this chapter. 
construction and applieation of a probability 
Bush and F. Mosteller, Stochastic Models for 
Inc., 1955. A number of articles containing 
F. Lazarsfeld (Ed.), Mathematical Thinking n 
ess, 1954. The method of difference equations 


in the author's Introduction to Difference Equa- 
Ine., 1958. 
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We shall consider a problem of this sort in which the system is one 
of random (Mendelian) mating, sometimes called panmizia, modified 
by both selection and mutation forces. The salient features of this 
model are described in the context of the following rules for obtaining 
an individual of any generation (say the (n + 1)st) if the genotype 
probabilities un, 2v,, Wn are known for the preceding generation (the 
nth). 

(i) A male parent is selected at random from this population; i.e., 
the probabilities are Un, 2Un, Wa for such a parent to be DD, rD, rr 
respectively, (We assume that genotypes occur among males and 
females with the same probabilities as in the whole population.) A 
at random from the two genes that the 
if the male parent is DD, then the 
if the parent is rD, then 


single gene is then selected 
male parent carries. For example, 
gene D is transmitted with probability 1; 
genes r and D are each selected with probability 4, ete. . 

(ii) A female parent is selected at random from the population, as 
in (1). A single gene is then selected at random from the two genes 
carried by the female parent. m 

The oe of is new individual of the (n + Ust generation is 
determined by the union of the male and the female genes selected in 
(i) and (ii). We shall speak of steps (i) and Gi) as trials of the experi- 
ment in which an individual of one generation is formed from indi- 
vid receding generation. . 

Pie ud Gre Te (panmixia) with respect to the single 
gene under study is characterized by the assumption that the "rs 
tions involved in trials (i) and (ii) are carried out at random and tha 


these trials are independent. 
As an example, let us calculate 


the (n + 1)st generation of a pop i 

delian ia yo rr individual can arise only € pe E 
female parent are recessive. | 

lected from both the male and female P mutually exclusive ways: 


eM is r) can occur in two 
ae is transmitted, or (b) the 


(a) the male parent is rr and then an r gene 
male parent is rD and then an r gene 18 transmitted. We thus find 


that 
(10.2) P(E) = (w)(0 + (20a) (4) = Un F Wa 
t if Ep is the event “female gen 


the probability of genotype rr in 
ulation undergoing random Men- 


TM e is r,” then 
Similarly, we observe tha 


P) = Un + Wns 
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But E, is determined by trial (i) and E; is determined by trial (ii). 
Since the trials are assumed independent, we conclude by Theorem 
9.1 that E, and E» are independent events. Hence 


P(E, N Es) = P(E )P(E3). 


Since E, N E» is the event “individual of (n + 1)st generation is rr,” 
we have 

P(E, (1 Ej) = Ung, 
and therefore 


(10.3) Usi = (Un + ws)? 


Although we can continue and similarly derive the other genotype 
probabilities in the (n + 1)st generation in terms of those in the pre- 
ceding generation, we turn instead to a more general model in which 
the assumptions of random mating are modified by mutation and 
selection forces. 

We first assume that the dominant gene D mutates to the recessive 
form r with probability «(0 <a < 1), this mutation being independ- 
ent of the source (male or female) of the gene. Let us think of the 
mutation occurring, if at all, after the male and female genes are 
selected, but before their union. To illustrate, we recalculate the 
probability of the event ZZ, as follows. E: can now occur in four mu- 
tually exclusive ways: (a) male parent is DD, gene selected is D, this 
gene mutates to r; (b) male parent is rD, gene selected is r; (c) male 
parent is rD, gene selected is D, this gene mutates to r; (d) male 
parent is rr, gene selected is r. Thus we find 


(10.4) P(E) = (un)(1)(@) + (20,)(3) + (2vn)(Z)a + (Wn) (1) 
= (vn + Wa) + alun + v). 

a that (10.4) reduces to (10.2) in case there is no mutation 
a = 0). 
We also add a selection force which affects the participation of pure 
recessive individuals in the mating process. Up to this point, we have 
assumed that all genes are viable. Now let us suppose that the fer- 
tility of rr individuals is impaired so that a proportion 8 of these 
individuals (whether male or female) do not have viable genes to 
transmit. To illustrate the impact of this assumption, we again cal- 
culate the probability of E, (male gene is r) but now on the condition 
F that the gene is viable; i.e., we calculate P(Fi|F). For the moment, 
we neglect the mutation effect. Until now P(F) has been 1, but with 
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the addition of fertility differences, we note that the male gene is 
viable if and only if the male parent is not one of those whose fertility 
is impaired. Hence 
(10.5) P(F) = 1 — bwn. 
Also we calculate 

P(E N F) = Qu)G) + (wn — Bwn) (1) 

= Vn + Wn — BWn. 

Hence 


n + Wn — Bus. 
(10.6) PEIR a o 


Note that (10.6) reduces to (10.2) when £ = 0 (no selection force or 
all genes viable.) 

We should observe that the mutation and selection forces are oppo- 
site in effect: the mutation force is directed toward an increase of 
the recessive gene in the population, whereas the selection force tends 
to decrease the relative frequency of this gene. 

Our problem can now be summarized as follows: Suppose 
that our system of mating is panmixia modified by the above mu- 
tation and selection forces. Let f, equal the probability (proportion) 
of the recessive gene r among the genes of parents in the nth gener- 
ation. The genotype probabilities tw, 2Qvo, Wo are given for n initial 
(n = 0) generation of parents. How does f, depend upon n? Is there 
an equilibrium value of f, that is approached asn gets larger and 
larger and, if such an equilibrium proportion exists, how quickly isit 
reached and what is its dependence on the initial genotypic compo- 


sition ation? ' 

We mdi EDT caleulated the probability P(EAIF) thas gue: 

_ gene produced by a parent of the nth generation 1s — be 

neglected the mutation force. Similarly we find that the probab1 ity, 

Say Pa, that a viable gene produced by a parent of the nth generation 
is dominant (before mutation process occurs) is given by 


Un +n, 


(10.7) i 


It follows that the probability that this viable gene is dominant after 


mutation is p,(1 — a), and so we find 
(10.8) f-21--29P 
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Knowing f, we obtain the genotype probabilities in generation 
(n+ 1): 


Unser = (1 — fy)? 
(10.9) | nt = 2f, (1 — fa) 
Vay = m 


and using these together with (10.7) and (10.8) we derive (Cf. 
Problem 10.4) a difference equation for the probabilities f,,: 


(10:10) fags = 1-1 a) A= a (n = 0,1,2, +++). 

Our problem can now be formulated analytically: Given numbers 
Jo, a, B, each between 0 and 1 inclusive, find the dependence of fs 
(the proportion of recessive genes among parents of generation num- 
ber n) on the generation number » and the prescribed parameters fo 
(the proportion of recessive genes among parents of the initial genera- 
tion v = 0), æ (the probability with which a dominant gene mutates 
to the recessive form), and 8 (the proportion of rr parents in each 
generation who do not have viable genes). 

"This problem is quite difficult to solve in all generality, but there 
are some important Special cases that are fairly easy. 


Case 1: Panmizia («—0,8— 0; neither mutation nor selection.) 


In this case, the difference equation (10.10) reduces to 
(10.11) fa = fa (n = 0,1,2, .. s) 


and we immediately conclude that Ju cm fu 
tions (10.9), it follows that 


Mies (Los eeu E cope 

25, = 2fo(1 = fo) = 2v = 2j, = ... 

UW = ò = w = wv, = rsg 
We have in this way demonstrated the so-called Hardy-Weinberg 
law: With repeated mating under panmixia, the distribution of the 
three genotypes (with respect to a single gene) is fixed after one gener- 
ation. Thus we see that the Mendelian laws are conservative in 
effect, and one may regard evolution as the study of those forces 
(mutation, selection, assortative mating, etc.) which tend to disturb 
this unchanging equilibrium of genotype proportions. 


for all n. In view of Equa- 


Case 9: a — 0 and 


8 = 1; i.c., no mutation and all pure recessives 
completely sterile, 
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Now Equation (10.10) becomes 


1 
(10.12) ha-l-qrg, (n = 0,1, 2, ---). 
If we make the substitution 
(10.13) E sdb 
f. "s 
then (10.12) takes on the simpler form 
(10.14) fa = Qn 1 (n = 0,1, 2, =+). 


From (10.14), with n = 0, we find gı = go + 1. Then putting n = 1 
in (10.14) we see that gs = m + 1 = go + 2. Writing (10.14) with 
n = 3 and then using our newly found expression for ge, we find 
fs =g: + 1 = go + 3. By mathematical induetion, we can prove 


(10.15) gn = Jo +n (n= 0, 1, 2, sm). 
In view of (10.13) we thus find that 
fo 
10. -——- 
unm fs 1+ nfo 


he elimination of a single gene under 
ecessives. Although f, decreases and 
decrease is quite slow. For ex- 


Equation (10.16) describes t 
complete sterilization of pure r 
approaches zero as n gets larger, this 
ample, let us compute the number of generations required in order 

individuals, the proportion of 


that, with complete sterility of all rr i 
the r gene decreases to half its initial value. We put f, = (fo in 


(10.16) and solve to find n = 1/fe. Thus if fo = .001, then 1000 gen- 
erations are required to reduce the proportion of the recessive gene to 
0005. Quantitative considerations of this kind are clearly of im- 
Portance in eugenics. 

: Other conclusions that follow from 
n the problems. 


Equation (10.10) are included 


PROBLEMS 


c genotypes in the first (n — 1) and 


10.1. Find the proportions of the thre first. 
lom Mendelian mating if 


second (n = 2) generation of ran 


(a) uo = 0, 2% = 2, Wo 
(b) Uo = w = 0, 2v = 
(c) wo = 1, 209 = Wo 
(d) w = 2v = 0, wo = 


i 


{I 


1 
0 
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10.2. 


10. 


2 


10.4. 


10.5. 
10.6. 


10.7. 
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Redo the preceding problem with a mutation force added. Assume 
& = 0.1 is the probability that the dominant gene D mutates to the 
recessive form r. 


Redo Problem 10.1 assuming both mutation and selection forces acsi 
ent. Use Equations (10.7)-(10.9) and assume a = 0.1, 8 = 0.2. In 
each case write the first three terms of the sequence fo, fi, fs +++» 


23 


To derive the difference equation (10.10), proceed as follows. (a) Use 
(10.8) to write fas: in terms of Pry. (b) In the equation found in (a) 
replace Payı by an equivalent expression obtained by using (10.7) and 
involving Unyi, Unity wi. (c) In the equation obtained in (b), sub- 
stitute for Unit, v.i, Wai the expressions given in (10.9) and simplify. 


Write out the details of the derivation of Equation (10.14) in the text. 


Show by mathematical induction that (10.15) is the solution of the 
difference equation (10.14) for all n = 0, 1, 2, +++. 


Consider the case in which all pure recessives are completely sterile 


(i.e., 8 = 1), but the supply of the recessive gene r is replenished by 
the mutation D = r;ie, o » Q. 


(a) Show that Equation (10.10) becomes 


(10.17) ha- ‘em n= 0, 1,2, -:, 


(b) Suppose fo = Va. Show that then f, = Va for n = 1,2, +++ 
(c) Suppose w = 1, Find the terms of the sequence fo, fi, fo, ** te 


(d) Suppose fy # Va and a < 1. Let f, = Va + 1/g, and show that 
9n satisfies the difference equation 


(10.18) Ian = Ag, + B (n= 0,1, 2, e), 
where 
FL pjt 
1—- Va 
(e) Show that Equation (10.18) is satisfied for all n = 0, 1, 2, ::: by 
N E BN au B 
m= (o raji + 


1-4 
(D Conclude thatit f, x Va and o = 1, then Equation (10.17) implies 


001) fj, = Va 4 1 


UE CRUS 
(+ 1 (= gu" d. 
Wa fo—VeJ\i — Va) Wa 


(n = 0,1, 2, °°") 
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(g) From (10.19) note that since A > 1, A” gets larger and larger as 7 
increases. Conclude that fn approaches Va as n increases without 
bound. Thus the population approaches a balanced (equilibrium) 

state in which the recessive gene occurs with proportion Va. 
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Chapter 3 


SOPHISTICATED COUNTING 


1 Counting techniques and probability problems 


Up to this point, we have illustrated our theory with examples that 
require only very direct and elementary counting procedures. We 
accomplished this by restrieting ourselves either to experiments lead- 
ing to sample spaces with a small number of elements (where we were 
able to count by direct enumeration) or, if the experiment had a large 
number of possible outcomes, to events whose elements were able to 
be counted by a direct application of the fundamental principle of 
counting. These restrictions were intentional, since our aim was Wo 
present the basie theory of probability unencumbered by difficulties 
due to incidental counting problems. But now we must face the fact 
that many interesting and important probability problems require 
more sophisticated counting techniques. We develop a few of these 
techniques in this section. Since we shall be using the fundamental 
principle of counting time and again, the reader may find it helpful 
to review the discussion on pp. 9-11. 

The following counting problems are solved in this section. We 
Suppose that a nonempty set A with n (distinct) elements is given. 


Problem 1, For any positive integer r < n, find the number of 
ordered 7-tuples the objects of which are different elements of A- 
Each such ordered r-tuple specifies an ordered arrangement or permu- 

122 
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lation of the r objects taken from the n elements of A. Because of this 
fact, the required number is denoted by P(n, r). 


Problem 9. For any nonnegative integer 7 < n, find the number 
of subsets of A that have exactly r elements. (Such a subset will be 
called an r-subset of A.) The number of r-subsets of a set with n 


elements is denoted by P" Each r-subset specifies a selection with- 


out regard to order of r elements from the n elements in A. 
Problem 3. We are given k numbered cells and k nonnegative 
integers Ta, Ne, tt, Nk whose sum is 7; i.e., for k>1, 
m+ meters $m = 
Find the number of ways of putting the n elements of A into these k% 


cells so that nı elements are in the first cell, 1» elements are in the 
Second cell, --+, ny elements are in the kth cell. This number is de- 


noted by 
n . 
" m» tts ws) 


A simple example will help to clarify these problems and the special 


notation we have introduced. 


Example 1.1. Let A = {a, b, © d, e} be a set of n = 5 elements. 
We list the following ordered pairs (2-tuples) formed by selecting two 
elements of A and paying attention to the order in which they are 


Selected: 


ab ac ad ac bc bd be cd ce de 


(1.1) b dc ec ed. 


ba ca da ca cb db c 
20 permutations of two elements 


Thes ‘dered pairs or 
ese are the 20 ordered p bols, P(5, 2) = 20. If a chairman 


from the five elements in A. In sym ` 
and a secretary must be elected from among five men on à committee, 


there are P(5, 2) = 20 different possible results of the election. 

Note that xa corteotly count as different the results Jeading to 

chairman = a, secretary = b on the one hand and chairman = b, 
; 


Secretary — a on the other. 


n ^o want to elect a subcommittee of two 
Now, however, suppose We want 


men from the five committee members. The order in which the 
choices are made is now irrelevant; we care only which two men are 
elected. Thus we want the number of 2-subsets of the set A. The 
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pairs of elements in the ten possible 2-subsets are enumerated in the 
first row of (1.1). Although ab and ba are different permutations, they 
determine the same 2-subset of A since (a, b) = (b, a}. We thus find 


that there are ten 2-subsets of A. In symbols, (2) = 10. 


Finally, suppose four members of the committee are arranging 
rides to the funeral of the fifth member, say e. Three cars are avail- 
able and can take 2, 1, and 1 passenger respectively. We list the 
possible assignments of men to cars: 


car 1 ab ab ac ac ad ad be be bd bd cd cd 
car 2 c d b d b c a d a c a b 
car 3 d c d b c b d a c a b a 


Thus we find that there are 12 ways of placing the four objects 
(committeemen a, b, c, d) into three numbered cells (the three cars) 
So that two objects are in the first cell, 1 object in the second cell, 
and one object in the third cell. In symbols, we have computed 


4 2 
i 1 ij = 12. (We killed off committeeman e only to create a 
) , 


counting problem with an answer small enough for us to easily list all 
the possible assignments to cars. Let the reader show that if all five 
committee members are being driven to a happy occasion, then there 
are 30 ways of assigning the men to three cars so that two ride in 
the first car, two in the second car, and one rides in the third car. In 


5 
symbols, (s 2, ^ = 30.) 


We turn now to derivati 
three problems w 
notation, 


ons of general formulas that solve the 
e have stated. But first a definition to simplify our 


Definition 1.1. If n is a positive integer, then the product of the 


integers from 1 to n is called “n factorial” and is denoted by nl. BY 
special convention, we agree to put 0! — 1, 


For example: 


De Sl=5-4-3-2-1 = 120 
aa Lead 6! = 6.5! = 720 

SIS Bie Les 71-7. 6! = 5040 
4-4.8.2.1-24. sl 8.71— 49320 
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Note that in computing 6!, 7!, and 8!, we used the fact that 
(1.2) (n4-1)! 2 (n--1)n) forn —0,1,2, ---. 
Observe also that when n = 0, Formula (1.2) reads 1! = (1)(0!) and 
hence is correct by virtue of our convention that 0! — 1. Although 
it may seem artificial now, we assure the reader that defining 0! in 
this way will turn out to be very convenient in the formulas that 
follow. 

Theorem 1.1. With the notation introduced in the statements of 


Problems 1-3, we have 
The number of ordered 


n! r-tuples or permutations of 

(1.3) P(n, T) = (n — r)! r objects from a set of n 
objects. 

The number of r-subsets 

(1.4) n\ _ n! (subsets with exactly r ele- 

* " ma ri(n pe r)! ments) of a set of n ele- 
ments. 


The number of ways of 
placing n distinct objects 
n! into k cells so that ni ob- 


(1.5) m So | : Ru UT 
Ta, Ne, t, Nk milne! «++ ne! jects are in cell i for? = 1, 


im quem n) 

amental principle of counting is the key tool in 
ve (1.3) note that to form an r-tuple 
bjects we must choose ai (task 1) 
(task 2) from the remaining n — 1 
a, (task r) from the remaining 
the number of ordered r-tuples 
he number of ways of completing 


Proof. The fund 
proving these formulas. To pro 
(a, as, +++, a;) from the n given 0 
from the n objects, then choose a» 
objects, and so on until we choose 
n — r +1 objects. Hence P(n, r), 
from a set of n objects, is equal to t 
these r tasks in the stated order. By the fundamental principle we find 
(1.6) P(n,r)-4 n(n—1):- (n — rt. 


Multiplying and dividing by (n — r)! we obtain the alternative form 
n(n —1):::(n—r +0)(r—7)! 
P(n, 1) = (n — 1)! 
erving that the numerator is indeed 
tegers from 1 to n. Note that when 


from which (1.3) follows by obs 
the product of all the positive in 
r = n, (1.6) becomes 


(1.7) P(n, n) = n(n — :-1-2 ni. 
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Formula (1.3) also gives this result due to our convention that 0! — 1. 
Hence (1.3) is true for all positive integers r < n, as claimed. T 
To prove (1.4) we observe that there are as many ways of writing 
an ordered r-tuple as there are ways of completing the following 
tasks in the stated order: (1) choose an r-subset from the given set 
of n elements and thus determine which r objects will be used to form 
the r-tuple, and then (2) arrange these r objects in some order so that 
there is a first, a second, ---, an rth object specified. It follows that 
P(n, r), the number of ordered r-tuples, must be precisely the number 
of ways of completing these two tasks. The first task can be done in 


(") ways by definition of this symbol. The second task can be done 


in P(r, r) = r! ways by virtue of the meaning of P(r, r) together with 
Formula (1.7). Hence 


(1.8) P(n,r) = (7) " 


or, using (1.3), 
() Pr nl 
Th b 7 rn wl 
as claimed in (1.4). Note that this argument fails when r = 0, since 
P(n, 0) has not been defined. But we can easily check that (1.4) is 


correct when r = 0. For a set with n elements has exactly one 0-sub- 


set (the null set Ø) and Formula (1.4) yields this answer, since when 
T = 0, (1.4) becomes 


n n! 
a (5) 70-97 


(By now the reader should b 
deed sensible.) 


Finally, to prove (1.5) we use (1.4) in conjunction with the fun- 
damental principle of counting. To determine the nı objects that 
B into the first cell we choose an m-subset from the available n 
objects. We can therefore allocate m objects to the first cell 12 


e convinced that putting 0! = 1 is in- 


n,) “ays. To determine the m: objects that are put in the second 
cell, we choose an ne-subset from the remaining n — m, objects- 


Hence we can put 7: objects into this second cell in (" ^ ™ ) ways- 


Continuing in this way, we see that to determine the ny objects that 
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go into the last cell we must choose an n;-subset from the remaining 
n — (md na c mp1) objects, and this can be done in 


m= 5994233 
Nk 


ways. By the fundamental principle of counting, we conclude that 


n (^ n—n n——-::: — nya 
Ta, Na, tt, Nk n, Ne Ny 3 


Now we use (1.4) to simplify the product on the right. The product 
of the first two terms is 


n\(n—m)\ _ n! (n — n)! 
m Ne mn — m1)! n2!(m — m — na)! 


n! 
= min!(n — m — n)! 


The product of the first three factors is 


ls n — nV (n — m — n 
n Ne ns 
T 


n! (n — n — n3! 


= mm!(n — ni — ng! nin — my — m — ns)! 


n! 


= m!nln!(n — m1 — ns — na)! 
When all & factors are multiplied, we similarly find that 
( n n! . 
ny Na, t5, d = alma! «+ mla — y — M2 — ++ — 7)! 
—n)!=0!=1, and so the proof of (1.5) 


But (n — 1m —ms— °°" 
is complete. 

important alternative interpretation of 
here. If we have n objects and n cells, 
ways of putting one object in each 
f these n objects. Hence we expect 


Two special cases and an 
Formula (1.5) are worth noting 
then there are clearly just as many 
cell as there are ordered n-tuples o 


that 
n ees :) ein nien 


and indeed, Formula (1.5) yields this expected result if we put k-mn 


and ny = n = «+ =n=lL 
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Also, if we put k = 2 and m = r (and therefore nz = n — r), then 
(1.5) becomes 


(1.10) D. zx | x: dg i de 


This equation merely expresses the fact that there are as many ways 
of placing n objects into two cells with r objects in one cell and 
(n — r) in the other cell as there are different r-subsets of a set with 
n elements. For in determining the r elements in the r-subsct (cell 1), 
we automatically place the remaining n — r elements in the com- 
plement of the r-subset (cell 2). 


A second interpretation of Formula (1.5) is introduced in the fol- 
lowing examples. 


Example 1.9. We know that there are P(5, 5) = 5! = 120 permu- 
tations of five distinct letters. But now suppose the five letters are 
two a's and three b's. How many different permutations are there 
now? We find that there are only ten: 


nabbb ababb abbab abbba baabb 
babab babba bbaab bbaba bbbaa. 


To see how to obtain the number 10 without explicitly enumerating 
the permutations, think not of the letters themselves but rather of 
the positions they occupy in the permutation. Counting from left to 
right, a permutation contains five positions and is uniquely deter- 
mined as soon as we specify the two positions for the a’s and the three 
positions for the b's. Thus there are just as many permutations as 
there are ways of putting the five positions into two cells, the first 
containing two positions for the a’s and the second containing three 


positions for the b’s. But this number is what we denoted by [2 
u 
and by (1.5) is seen to be 10, as expected. 


Example 1.3, To determine the number of distinguishable arrange- 
ments on one shelf of four different books, for each of which there are 
two copies, we note that each arrangement contains eight distinct 
Positions and is determined as soon as we specify the two positions 


for the first book, the two positions for the second book, etc. The 
eight positions can be placed in four cell 


ka 4 s, each containing two po- 
sitions, in 
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Bum A^ 
2, 2, 2,2) 7 gy - 2920 


different ways by (1.5). Thus there are 2520 distinguishable arrange- 
ments of the eight books. 3 


The arguments used in the preceding examples are readily general- 
ized to prove the following theorem. 

Theorem 1.2. If we have n objects, nı of which are of one kind, n: 
of a second kind, -+ +, nx of a kth kind (where nı + na + +++ + nr n), 
then the number of distinguishable permutations of the » objects is 


given by 


n ED R 
m,ns 5 a (omne! ng 
Proof. We have only to observe that each permutation contains n 
positions and is uniquely determined as soon as we specify the nı 
positions for the objects of the first kind, the n positions for the ob- 
jects of the second kind, +++, the nx positions for the objects of the 
kth kind. Thus there are as many permutations as there are ways of 
putting the n positions into k cells, the ¿th cell containing the n; po- 
sitions for the objects of the ith kind for 7 = 1, 2, ++, k. But this 
number of ways is given in (1.5), and so the proof is complete. 


We turn now to some examples illustrating the use of these count- 
ing techniques in computing probabilities. 


Example 1.4. Find the probability when a bridge game is dealt 


that cach player has exactly one ace. 
There are as many ways of dealing four bridge hands as there are 


ways of placing 52 objects (the cards of the full deck) into four ceils 
(the North, East, South, and West hands) so that each cell contains 


13 cards. Thus, by (1.5), there are 


52) _ bal 
N= = 13, 13, " = (1304 
different, deals in bridge. Let us choose as our sample space S a set 
with N elements, each denoting a different deal, and let us assign to 
each simple event of S the same probability 1/N. (N is actually 
equal to a number larger than 53 billion billion billion, but we do not 
need to know the precise value of N to complete this problem.) 
Now the event “each player has one ace" is the union of as many 
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simple events of S as there are different deals for which North, East, 
South, and West each have one ace. If we call this number x, then 
the required probability is z/N by Theorem II.4.7. To determine v, 
note that there are as many deals for which each player has one ace 
as there are ways of completing the following tasks in the stated 
order: (1) deal the four aces, one to each player, (2) deal the remain- 
ing 48 cards, 12 to each player. By (1.5), task 1 can be done in 


4 4p 
A 1, 1, 1) = api ways 
and task 2 in 
48 48! 
a 12, 12, wl = (2p Ways. 


By the fundamental principle of counting, the number of deals for 
which each player has one ace is 


_ 4M8! 
? — Gani 
and the required probability, say D, is given by 
g — 4M8I13)* 
P = N > (1252! 


To compute p, we first use Table 21 to find the logarithm of p: 
log p = log 4! + log 48! + 4(log 13!) — 4(log 12!) — log 52! 
1.3802 + 61.0939 + 4(9.7943) — 4(8.6803) — 67.9067 
= —.9766 
= 9.0234 — 10. 


Now we use Table 22 to estimate the value of p. 
0234 is between .0000 and .0414, we know 
11. Thus, odds against the event "each pl 
proximately 9 to 1. 


ll 


Since the mantissa 
that p is between .10 and 
ayer has one ace” are ap- 


Example 1.5. From 
What is the probabili 


There are (1) 


ten people. Since 


10 10! 10.-9.8.7 
(2) 410! 4-3-2. = 210, 


five married couples, four people are selected. 
ty that two men and two women are chosen? 


ways of selecting a subset of four people from all 
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TABLE 21. COMMON LOGARITHMS OF FACTORIALS 


n log n! n log n! log n! 
1 0.0000 26 26.6056 51 
2 0.3010 27 28.0370 52 Ed 
3 0.7782 28 29.4841 53 69.6309 
4 1.3802 29 30.9465 54 71.3633 
5 30 32.4237 55 73.1037 
6 31 33.9150 56 74.8519 
7 32 35.4202 57 70.6077 
8 33 36.9387 58 78.3712 
9 34 38.4702 59 80.1420 
10 35 40.0142 60 81.9202 
11 36 41.5705 61 83.7055 
12 37 43.1387 62 85.4979 
13 38 44.7185 63 87.2972 
14 10.9404 39 46.3096 64 89.1034 
15 12.1165 40 47.9117 65 90.9163 
16 13.3206 41 49.5244 66 92.7359 
17 14.5511 42 51.1477 67 94.5620 
18 15.8003 43 52.7812 68 96.3945 
19 17.0851 44 54.4246 69 98.2333 
20 18.3861 45 50.0778 70 100.0784 
21 19.7083 46 57.7406 71 
22 21.0508 4T 59.4127 72 
23 22.4125 48 61.0939 73 
.7927 49 62.7841 74 107.5196 
.1907 50 64.4831 75 109.3946 
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7243 
7993 
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9191 
9685 


1461 
3802 
5315 


6435 
7324 
8062 


8692 
9243 
9731 


* For example, log 1.2 
log.76 — 9.8808 — 10 — 


1761 
3979 
5441 


6532 
7404 
$129 


8751 
9294 
9777 


2041 
4150 
5563 


6628 
7482 
8195 


8808 
9345 
9823 


2304 
4314 
5682 


6721 
7559 
8261 
8865 


9395 
9868 


2553 
4472 
5798 


6812 
7634 
8325 


8921 
9445 
9912 


2788 
4624 
5911 


6902 
7709 
8388 


8976 
9494 
9956 


0792,log.12 = 9.0792 — 10 = —.9208, 
—.1192. 
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we take as sample space a set S with 210 elements and assign each 
simple event of S the same probability zły- . 
Now we can select two men from the five available men in 


(3) = 10 ways. Similarly, there are ten ways of selecting two 


women. By the fundamental principle of counting, there are 
10 - 10 = 100 ways of selecting two men and two women. Hence 
the required probability is 199 or 19. 


Example 1.6. A coin is tossed five independent times. What is the 
probability of the event E that we get exactly three heads? 

We define S; = (H, T) as sample space for a single toss and, since 
we have no information about the coin, let us put 


P(H)-2»p, P({T})=q=1-p, 


Where p is some number between 0 and 1 inclusive. For the five-toss 
experiment, we must define as sample space S the Cartesian product 


S = Si X Sı X Sı X Sı X Sı. 


Because of the assumed independence of the tosses, the probability 


of each simple event of S is determined by the product rule given in 
Section II.9. For example, 


P({HHHTT}) = PQHPGBJ)PQH))PQT))P(T)) = pg 
and similarly 


P({HHTHT}) = p. 2:q:p-q— pig. 


In fact, any simple event whose sole element corresponds to an out- 
come resulting in three heads and two tails will have probability p*q?. 
The number of such simple events is the same as the number of 
5-tuples containing exactly three H’s and two T’s. Such a 5-tuple is 
uniquely determined as soon as we select the positions (i.e., numbers 
identifying which are the tosses) that resulted in heads. We can select 


three positions from the available five in (3) = 10 ways. Since the 


event E is the union of th 


ese ten simple events, each with probability 
PÈ, we have 


P(E) = 10p%q?, 
If the coin is fair, then 
2 


P=q=% and P(E) =} = 31, approximately. 
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But if the coin is biased so that, let us say, p — $ and q — $, then 
P(E) = $$ = -16, approximately. i 
This coin-tossing example is typical of an important class of prob- 
lems that we will study in Chapter 5. Note especially that we have 
here an example in which (assuming p # q) the simple events of S 
are nol assigned the same probability. 


Example 1.7. What is the probability that a poker hand will have 


one pair? 
A poker hand is a 5-subset of the set of 52 cards in the full deck and 


2 
so there are b ) different poker hands. If for the moment, we call 


this number N, then our sample space S has N elements and each 
simple event of S is assigned probability 1/N. Using Formula (1.4) 
and some arithmetic, we find that N — 2,598,960. 

Now a poker hand with one pair has two cards of the same face 
value (i.e. two aces, two kings, ete.) and three cards whose face 
values are all different and different from that of the pair. We ob- 
tain a unique poker hand with one pair by completing the following 
tasks in order: (1) choose the face value for the pair from the 13 


n 13 
available face values. This can be done in d = 13 ways. (2) 
Choose two cards with the face value selected in (1). This can be 


done in (2) — 6 ways. (3) Choose the three face values for the 
other three cards in the hand. Since there 
( F3 ) = 220 ways. (4) Choose 


are 12 face values from 


which to choose, this can be done in 
available) of each face value chosen in (3). 
4 ways. By the fundamental principle of 


1 


one card (from the four 
This can be done in # = 
Counting, there are "TET 
1$. 6 - 220 - 64 = 1,098,240 


poker hands with one pair. Hence the required probability is 


1,098,240 _ 49 EE 
2,508,960 = 42, approximately 

and as this answer shows, it is not at all 
nd. Only hands with no pair at all 
urring. (See Problem 1.25.) 


As poker players know, 
unusual to have a one-pair ha 
have a higher probability of occ 
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Let us conclude by outlining the procedure we followed in answer- 
ing the probability questions posed in the preceding problems. 

(1) Define a sample space S for the experiment and assign prob- 
abilities to its simple events. This may involve counting the elements 
of S (as in Examples 1.4, 1.5, and 1.7) or it may require other non- 
counting considerations (as in Example 1.6). 

(2) Determine P(E) by caleulating the sum of the probabilities of 
the simple events whose union is the event E. In the problems under 
diseussion here, this requires counting the number of elements of Z. 
To do this, it is often helpful first to construct a sequence of tasks 
having the following property: each way of completing the tasks in 
the specified sequential order produces an experimental outcome corre- 
sponding to exactly one element of E and conversely, each element of 
E corresponds to an outcome produced by completing the tasks in 
exactly one way. (We did this in Example 1.4 where a bridge deal for 
which each player has one ace was produced by completing two tasks, 
and also in Example 1.7 where each poker hand with one pair was 
thought of as produced by completing four tasks in order.) The prob- 
lem of counting the number of elements in E is thereby reduced to 
that of counting the number of ways of completing these tasks. 

(3) Now use one or more of the formulas discussed in this section 
to count the number of ways of completing each task. Then invoke 
the fundamental principle of counting to determine the number of 
ways of completing all the tasks in the stated order. This number is 
the number of elements in E, and P(E) can thus be evaluated. 

The problems that follow will help the reader develop his ability 
to count by means of the formulas presented in this section. He thus 
becomes able to find probabilities in a wide class of more complicated 
but more interesting experiments than heretofore considered. 


PROBLEMS 
11. Evaluate: 


8! 2! 
(a) 35i (b) es (e) (3) 


o (2) © (432) © (os) 
1.2. How large must n bi n 


ig € before n! execeds (a) a thousand? (b) a million? 
(c) a billion? (d) a trillion? [H int: Use Table 21.] 
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1.8. Compute to two decimal place accuracy (using logarithms) the value of 


1.4. 


1.5 


1.6. 


1.7. 


1.8. 


1.9. 


( 60 22i 
12 4 8 2 3 
(a) 4L ) 12V 1V/6Yy 
* (9 ui (8) e (3) @) 
12 12 
Let r be a positive integer. For any number 2, let 
(3) = a(t — 1)(@ — 2) =- (c-r 1); 
Show that 
(a) if x is an integer and x > r, then (2), = P(z,r). 


(b) (—1), = (7-1! 
(c) (72) = (-D'( + 1)! 


(d) (-5.- cor (7) 


s determined by two points in a plane. Three 
three noncolinear points. Six lines are 
no three of which are colinear. How 
n points, no three of which are 


(a) One straight line i 
lines are determined by 
determined by four points, 
many lines are determined by 
colinear? 

(b) A triangle has no di 
pentagon has five diagona 
of n sides have? 

The 11 digits 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 

able ways. How many permutations 

343? 

Each permutation of the digits 1, 

number. If the numbers correspon 

listed in order of increasing magnitude, 


agonals; à quadrilateral has two diagonals; a 
Is. How many diagonals does a polygon 


4 are permuted in all distinguish- 
(a) begin with 22? (b) begin with 


2, 3, 4, 5, 6 determines a six-digit 
ding to all possible permutations are 
whieh is the 417th? 


uted and, as in the preceding 


The six digits 1, 1, 1, 2, 3, 3 are perm 
digit numbers in order of in- 


problem, we list the corresponding six- 
creasing magnitude. 

(a) How many numbers start 
(b) How far down in the list is 
We have two each of n different objects, 2n objects altogether. How 
many distinguishable scleetions of four objects are there for which 
(a) all four objects are different? (b) two are alike and two different? 
(c) two are alike and the other two are also alike? (d) The total num- 
ber of distinguishable selections of four objects from these 2n objects 
is equal to six times the number of 4-subsets of the set of n different. 


objects. Find the value of n- 


with the digit 2? 
the number 321,311? 
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1.10. 


1.11. 


1.12. 


1.13. 


1.14. 


1.15. 


1.16. 


1.17. 


1.18. 


1.19. 
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A group of ten boys and ten girls is divided into two groups of ten each. 
Find the probability that each group contains as many boys as girls. 


A bookstore clerk has ten books, five each of two titles, to place on a 
bookshelf. If he places them at random so that all distinguishable 
arrangements are equally likely, what is the probability that (a) five 
copies of one title follow five copies of the other title on the shelf? 
(b) the two titles alternate on the shelf? 


You need four eggs to make omelets for breakfast. You find a dozen 

eggs in the refrigerator but do not realize that two of these eggs are 

rotten. What is the probability that of the four eggs you choose (a) 

none are rotten? (b) exactly one is rotten? (c) exactly two are rotten? 

In the preceding problem, suppose you break the four eggs into a saucer 

and discover that you have chosen at least one rotten egg. 

(a) What is the conditional probability that both rotten eggs are in 
the saucer? 

(b) If you choose four other eggs from the remaining eight eggs, what 
is the conditional probability that they will all be good? 


Refer to Example 1.6 of the text and find the probability that the coin 
falls heads exactly I times, where k = 0; 1, 2,8, 4, 5. 


Baseball team A plays team B ten times in a given month. Assume 
that team A is better than team B and has probability # of winning 
and probability % of losing each game. If the games are considered as 
ten independent trials, find the probability team A wins (a) exactly 
six games, (b) exactly seven games, (c) a majority of the games. 


A pack of ten cards consists of three aces, two kings, two queens, and 
three jacks. We shuffle the deck and pick one card. Let this trial be 
performed eight independent times, What is the probability that an 
ace is selected twice, a king three times, and a jack three times? 


Find the probability that in eight independent rolls of a fair die, the 
numbers 1, 3, and 5 turn up two, three, and three times, respectively. 


From a panel of 20 seniors, 15 juniors, ten sophomores, and five fresh- 
men, a committee of five is selected at random. What is the probability 


that the committee consists of two seniors and one from each of the 
other classes? 


In the preceding problem, suppose you know that the committee con- 
tains exactly one freshman. What is the conditional probability that 


there are also two seniors, one sophomore, and one junior on the 
committee? 
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1.20. 


1.21. 


1.22. 


1.23. 


1.24. 


1.25. 


There are ten defective and 60 good transistors in a lot from which you 
select a sample (without replacement) of 12. Calculate with two deci- 
mal place accuracy the probability that the sample contains (a) no 
defectives. (b) exactly one defective. (c) exactly two defectives. (d) 
exactly three defectives. (e) exactly four defectives. (f) exactly five 
defectives. 

In the preceding problem, suppose the sample of 12 was selected with 
replacement. Find the probability that the sample contains three de- 
fectives and compare with the corresponding answer for sampling 
without replacement. Do the same for 0, 1, 2, 4, and 5 defectives. 


We scramble the letters of the word “Muhammadan” and then arrange 


them in some order. 

(a) What is the probability that the three a’s will be consecutive 
letters? 

(b) What is the probability that the three a’s will be consecutive and 


the three m’s will also be consecutive? 
(c) What is the probability that no three consecutive letters are alike? 


The 11 letters of the word “Mississipp! 
ranged in some order. 


(a) What is the proba 
the resulting arrangement? 

(b) What is the conditional pro 
tive, given that the arrangemen 
with "s" 

(c) What is the conditional pr 
tive, given that the arrangemen. 


A poker player holds a pair of aces and a king, queen, and jack. He 
discards three cards, holding his pair, and draws three more cards from 
the deck of 47 cards. What is the probability that his hand contains 
(a) three aces after the draw? (b) two pairs, aces high, after the draw? 


(Note: When à hand is spoken of as containing a pair of aces, three 
ill mean that it contains no higher count. Thus, to 


aces, etc., we Wl d ; 
say a hand contains three aces means that it contains exactly three 


aces and two different cards. To say & hand contains two pairs, aces 
high, means that it contains a pair of aces, & different pair, and a fifth 
card of a still different face value.) 

d the probability of a one-pair poker hand. 
ng poker hands: 


i" are scrambled and then ar- 


bility that the four is are consecutive letters in 


bability that the four i's are consecu- 
t starts with “M” and ends 


obability that the four i's are consecu- 
t ends with four consecutive esses? 


In Example 1.7 we foun ) 
Now find the probability of the follow 


(a) no pair (five different face values, not in sequence, not same suit.) 
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1.26. 


1.27. 


1.28 


1.29. 


1.30. 


1.31. 
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(b) two pairs (one pair of each of two different face values plus a card 
of a third face value.) 

(c) three of a kind (exactly three cards of one face value plus two 
different cards.) . 

(d) straight (five cards in Sequence, but not all of the same suit.) 

(e) flush (five cards of the same suit but not in sequence.) 

(f) full house (three cards of one face value, and two cards of another 
face value.) 

(g) four of a kind (four cards of one face value). See Example 1.2.3. 

(h) straight flush (five cards in sequence and of the same suit.) 


In seven-card stud poker, your first three cards are of the same suit. 
What is the probability that you will find at least two more cards of 
the same suit among the other four cards in your hand? 


Let North be dealt 13 cards from a bridge deck and suppose Si is the 
sample space corresponding to this deal (trial). 


(a) Show that dealing hands to all four players in a bridge game can 
be considered as a four-trial experiment in which the trials are 
identical but not independent. What is the sample space S for the 
four-trial experiment? 

(b) Calculate the probability of the event E that North has exactly 
one ace, considering E as a subset of S. 

(c) Caleulate P(E), but now considering E as a subset of Si. 

(d) Give a precisely worded and complete explanation of why your 
answers in (b) and (c) are the same. (Hint: Refer to Section II.9.) 


In a bridge game, what is the probability that you and your partner 
together have exactly k aces, where k = 0, 1, 2, 3, 4? 


When a bridge hand is dealt, what is the probability of (a) a 5-4-3-1 
distribution, i.e., of a hand containing five cards of one suit, four of 
another, etc.? (b) a 4-4-3-9 distribution? (c) a 4-3-3-3 distribution? 


You and your partner in bridge are declarers and hold nine spades, 
including the ace and king. The defenders hold four spades, including 
the queen. What is the probability that the distribution of the four 
Spades in the opposing hands is (a) four in one hand, none in the other? 


(b) three in one hand, one in the other? (c) two in one hand, two in the 
other? 


Continuing the preceding problem, you know that the queen will fall 
when you lead the ace and king if the four spades are divided equally 
between the Opposing hands or if the queen is the only spade in one 
of them. Show that odds for the queen's falling on the lead of the ace 
and king are approximately 1.13 to 1. 
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2. Binomial coefficients 


'The numbers (7) introduced in the preceding section have many 
interesting and important properties that we will need to know for 
our later work. Although it is a slight digression at this point, we 


pause here to develop some of these properties. 
Our first task is to explain the title of this section. The reader is 


familiar with the formulas 


(2.1) (ry? =a? + 22y + y? 
(2.2) (x + y} = 2 + Baty + Say? b 
(2.3) (+y = at + 4a3y + 625? + Aay? + y. 


For any positive integer n, (z + y)" is the product of n equal factors: 


Gz) (sy = DG» 6t». 
(z + y)", it is helpful first to 
] cases (2.1)- (2.3). Indeed, the 
e written as follows, thus sug- 


In seeking a general formula for 
identify the coefficients in the specia 
reader should check that these can b 
gesting our first theorem: 


2 : 2 2—r3r. 
eim (jeu Qr Oen 


(@ + y) = (o a+ (i)e (er (3) 2" 


3 (ev: 
r=0 

4 4 S nx INL. H 
(es (ov (Qo MOLD 
= $ ($) oy. 


r=0 
Theorem 2.1. (The binomial theorem.) If n is any positive integer 


and z and y are any numbers, then 
n " n-2)2 wee $ n 
eme eren enne 


or, more concisely, 
n 
" 


25) cru - Xe 


(w+ y 
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Proof. To compute (x + y)", we choose either the letter x or the 
letter y from each of the n factors in (2.4) and multiply these n 
choices. If we do this for all possible choices of z's and y’s and add 
the results, we obtain (x -+ y)". For example, we get the product x” 
by choosing z from each factor, we get the product z"7!j whenever we 


choose x from all but one factor, we get z"-?y? whenever we choose x 
from all but two factors, etc. 


For a given integer r (0 < r < n), the product z"-7y' is obtained 
whenever we choose exactly r y's (and therefore n — r 2’s.) To de- 
termine our choice uniquely, we have only to decide from which r 


n z 
of the n factors we select y’s. Hence there are (7) choices, each lead- 


ing to the product z"—7y7 and so the term (") z"—y' appears in the 


expansion of (x + y)”. Since r is any integer from 0 to n inclusive, 
the theorem is proved. 


Because the numbers ©) appear as coefficients in the expansion 


of a power of a binomial, they are called binomial coefficients. 


Example 2.1. To expand (1 + 4" we putz = 1 
find 


G8) atom 2 (Pee (Dae (ea (e. 


If we now put ¢ = 1, there follows the interesting identity 


2T n = n " DER i G 
en 2 (o) + (3) + st 
Since (" is the number of r-subsets of a set with n elements, we 
see that the right-hand side of 
15 1, there being only one null 
the number of 2-subsets, ete. 
of subsets and so (2.7) sup 
Which says that a set with n 


;y = tin (2.5) and 


(2.7) is the number of 0-subsets (which 
set) plus the number of 1-subsets plus 
This sum is therefore the total number 
plies another proof of Theorem I.2.1, 
elements has 2” subsets, 


Example 2.2. The binomial theorem can be used to compute an 
approximate value of (.99)5, For With z = 1 and y = —.01 in (2.5) 
or equivalently, with  — —.01 in (2.6), we obtain 
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(.99)» = (1 — .01)* 


(ee (i) ton + (5) (ont (S) co» 


1 — .06 + .0015 — .00002 + --- -+ 000000000001 
.941, to three decimal places. 


A convenient device for calculating and displaying the binomial 
coefficients is known as Pascal’s triangle.* The first few rows are 
es in Table 23. The row for TABLE 23 

= 0 lists the one coefficient in the 
expansion of (æ + y) the row for 
n = 1 lists the two coefficients in the 
expansion of (s + y), the row for 
n = 2 lists the three coefficients in 
the expansion of (x + y)", and so on. 
Since every set with n elements has 1 
exactly one, null subset (9) and ex- 31 
actly one n-subset (namely, itself), 641 
we find 1’s under the column headed 1010 5 1 
* = 0 and also along the hypotenuse 6152015 6 1 


of the triangular array. 
We observe that there is a simple 


triangle. For if we start at any num 
move to the right one number and then drop down to the row below 


We note that the sum of the first two numbers is precisely the number 

in the row below. For example, starting at the left in the row for 

n = 4 we obtain the numbers in the row for n = 5 as follows: 
144-58, 4+6=10, 6+4=10, 44-158. 


es that our observation is generally true 
able one row at a time and 


123 465 6 


1 
2 
3 
4 
5 


relation among numbers in the 
ber not on the hypotenuse and 


The following result prov 
and can thus be used to extend the t 


: n 
thereby to compute binomial coefficients 3 for larger and larger 


values of n. 


Theorem 9.9. For any positive integers r and n with r <n, 


rom the first to form and study this triangular 
has become attached to it. For inter- 
“Cardan and the Pascal Triangle," 


950), 387-390. 


* Pascal (1623-1662) was far f 
array of numbers, but somehow his name 
esting historical notes, see C. B. Boyer, 
American Mathematical Monthly, vol. 57 (d 
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n "n —1 n — pi 
es Geart 
Proof. Although it is possible to give a direct algebraic proof by 
writing the binomial coefficients in terms of factorials (see Problem 


n 
2.5c), we prefer the following argument. The (7) r-subsets of a set 


with n elements can be divided into those that include a given ele- 
ment, and those that do not. The number of r-subsets including the 


given element is C u 3! since fixing one element leaves us free to 
j= 


select r — 1 others from the remaining n — 1. The number of r-sub- 


7 s fh 
sets that do not include the given element is ( 


now choosing r from n — 1 e 
for all subsets, formula (2.8 


j} since we are 


lements. Since we have now accounted 
) in proved. 


Two other properties of the binomial coefficients are apparent from 
the Pascal triangle. T 


he symmetry in each row of the triangle is due 
to the identity 


eo (-C*) 


which expresses the fact that 
matically a selection of its co 
versa. It follows that 


)- n nN _ "ON (mN m d gi 

0/ a7 N= n= i 2/ \n — 9)’ $e, 
and so in each row of the tri; 
equal, as are the second and n 


We also observe that the bi 
as we move to the right and 


every selection of an r-subset is auto- 
mplementary (n — r)-subset, and vice 


angle, the first and last numbers are 
ext to last, and so on. 


nomial coefficients in any row increase 


then eventually start to decrease. We 
leave for the problems the proof that this is generally true, and in- 
stead derive another important identity involving binomial coeffi- 
cients, 


As we shall see in the example that follows, it is convenient to 
extend the definition of the binomial coefficient 
and r when the symbol no longer has any com 


Indeed, it is Possible and useful to have H 
r 


n 
(") to values of n 
binatorial meaning. 


defined even when n is 
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not an integer. (See Problem 2.9.) But here we make only the defi- 
nition that for any positive integer n and r also an integer, 


(2.10) (") —0 ifeither r>n or r <0. 


Since a set with n elements has no subsets with more than n elements 
and also does not have any subsets with a negative number of ele- 
ments, Definition (2.10) is quite reasonable. 


Example 2.3. We are given n defective and m acceptable items, 
n + m items altogether, from a production line. They are mixed up 
and we are to draw a subset of r items from the lot. We count the 
number of possible r-subsets in two different ways, and thus derive 


a useful identity. First, there are clearly [? ga w) r-subsets that can 
be drawn from the n +m items. To count again, note that the 


seen can be classified according to the number of defective items 
ey contain. We can select k defectives (and therefore r — k ac- 


ceptable items) in 
n ) m ) . 
kjXr—k) TOYS 


x " 
E we let, i vary over all possible values and add these numbers, then 
ve obtain all the r-subsets. Hence 


(90)«(967)«(6 23 


(2.11) 3» (2) (, ^ ^ = (" : "y 


N , x 
ke that some of the terms in the sum may be zero, since we cannot 
ar a more defective and acceptable items in the r-subset than there 
e defective and acceptable items in the whole lot. For example, if 


E number of acceptable items m happens to be equal to r — 2, then 


r ) and (, ee i both are zero according to (2.10). The value of 


the x j . 
© definition made in (2.10) lies in the fact that it allows us to write 


e sum i ; ; 
Oi m in (2.11) without worrying about terms that should be 


itted; i Bod 
ed; instead of omitting them we made sure they would be zero. 
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Identity (2.11) will be used in Chapter 5 in connection with the so- 
called hypergeometric distribution in probability. 


The method we used to prove Theorem 2.1 can be extended to 
prove the following result. 


Theorem 2.3. (The multinomial theorem.) Let n be any positive 
integer and zı, z», ---, x, any k numbers. Then 


n Tuna Nk 
geie ea a= sirp eu 
(2.12) (zı + Ta 4 + T) > " nij deis M 12 k 
where the sum is taken over all nonnegative integers m1, n2, *-*, Nk 
such that m + n; + --- Hnr — m. 


Proof. We again have n factors to multiply, but now each is the 
multinomial (zı + 22 -+ --- + z+) instead of the binomial @+ y) 
in (2.4). From each factor we must choose zi, or z», -- +, or z;, multi- 
ply our n choices and add these products for all possible choices. For 
example, we get the product z? by choosing xı from each faetor, we 
get the product z17?rsr, by choosing x; from n — 2 of the factors, 1» 
from one factor and z; from another factor, ete. 

For given nonnegative integers ni, no, +++, m; (whose sum is n) we 
get the product rix? --- z?* whenever we choose exactly 


NTIS, Memos, +++, m T'S 


from the n available factors. There are as many ways of making 
such a choice as there are ways of placing the n factors into k cells, 
the first cell containing the m factors from which we choose 2, the 


second cell containing the ns factors from which we choose X», etc. 
By (1.5) there are therefore 


(2.13) ( n ) 

T, Na, +++, Ny 
choices, each leading to the product TIT --- P. Thus we have the 
multinomial theorem. 


The numbers (2.13) are called multinomial coefficients. Since, as 


we noted in (1.10), a binomial coefficient is a special case of a multi- 


nomial coefficient, we see that the multinomial theorem becomes the 
binomial theorem if we put k = 2, 


Example 2.4, To expand (p + q + 7)? we write 
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tatn C. d s) pg, 
y FER, TES 


where the sum is taken over all nonnegative integers M1, 7», n; such 
that nı + ns + n; = 3. Hence 


3 3 3 3 " 

++ = (santt aT PRIMA pa 
3 : $i. 3 : 

um d lao, ders jv 


8 X E. 3 
IRL. s(a Lajer 


=p + +r + 3pg + 3p? + 3pr + 3pr 
+ 3q’r + 3qr? + 6pqr. 
We can give a probability interpretation to the terms in this sum 
by imagining that each person in a certain population is classified 
according to whether he answers “yes,” “no,” or “don’t know,” when 
asked a certain question. If we select one person at random from the 
Population and record his answer, this trial is defined by the sample 
Space {Y, N, DK}. We make the assignment of probabilities to 
Simple events as follows: 
POY) =p, PON} =9, P({DK}) =r, 
Where we assume p > 0, q > 0,7 > 0,andp +q +r = 1. Now per- 
form this trial three independent times, for instance, by choosing a 
random sample of three people with replacement from the population 
and asking each the question. Then the terms in the expansion of 
(P q4- r)? give the probabilities of all the possible combinations 
ot answers. For example, the probability that all three people answer 
yes” is p’, the probability that exactly one person answers "yes" 
and two people answer “don’t know" is 3pr?, etc. 


PROBLEMS 


2-1. Expand by the binomial theorem. 
(a) (p gs (b) (1 — z)! (c) (a — 30)* 

2.2. What is the coefficient of a? in the expansion of (2a — b)8? 

23. By means of the binomial theorem, evaluate to three decimal places. 
(a) (1.01) (b) (1.02) (c) (98) 
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2.4. 


2.5. 


2.6. 


2.7. 


2.8. 
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Identify each expansion and thus evaluate each sum without computing 
individual terms in the sum. 


e à, (i) © ape 

o E)» @ 2 o(;)? 
OAM) GY o ECH 
e£(t)mee wg ( t)r 


Write the binomial coefficients in terms of factorials and thus prove the 
following identities. 


n n—1 
Pip) ese) 
n\ _ n "Y. (ml n 
ma(t)-+0(,$,)+r(%) "(13562 
n| [(n—1 n=] 
n Dee) 
Use the law of formation (2.8) to extend the Pascal triangle in Table 
23 to n = 10. 


By writing the binomial coefficients in terms of factorials, derive the 
recursion formula 


n | n—rín B 
(71) -234(?) (r—70,1,2,...,4 — 1) 
This formula enables us to compute the numbers in any row of the 


Pascal triangle one by one, starting from (3) = 1. Compute the bino- 
mial coefficients for n = 10 this way. 


Consider the binomial coefficients (7) with n fixed and r — 0, 1, 2, 
T 

++, 7. With this order, show that () is greater than its predecessor 

i r 

ifr< in + 1) and is smaller if r > 2(n + 1). Show also that if n is 

an even integer, then there is one largest binomial coefficient, but if n 

is odd, then there are two equal binomial coefficients that are larger 

than all the others. (Hint: Consider the ratio of d to ( " il and 

r r— 


determine when this ratio is greater than, less than, or equal to 1.) 
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2.9. Using the notation introduced in Problem 1.4, define the generalized 


binomial coefficient (7) for any number z and any positive integer r 


(es 


(a) Show that this reduces to the familiar definition if x is a positive 
integer. (Consider the case x < r as well as x > r.) 


(b) Show that ( +) = (—1)"2 (2) 


n 


by the equation 


2.10. Prove the following identities: 


vi -(De()-e-eem-n 

Q) (") «(oa (De (2) = 2- if n is even. 
© Aar Ga) 

o AC - (D) 


2.11. In the expansion of (p + q + r + s)", compute the coefficient of 
(a) po (b) p% (c) Persi. 
2.12. Expand (p + q 4- 7)! by the multinomial theorem and give a prob- 


ability interpretation to each term in the expansion assuming p, q, and r 
are nonnegative numbers with sum 1. 
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Chapter 4 


RANDOM VARIABLES 


1. Random variables and probability functions 


When we perform an experiment, we are often interested not in the 


ates on the number of honor points 
itself; in selecting a random 


ils that constitutes the 
vesult of the 50 tosses; etc. 


In all these examples, we have a rule Which assigns to each outcome 
of the e i i 


The reader is already familiar with the function concept. For ex- 
158 
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ample, the equation y = 2? determines a function whose domain is 
the set of all real numbers and whose range is the set of nonnegative 
real numbers; i.e., to each real number z is associated the (necessarily 
nonnegative) real number y = z*. As another familiar example, 
think of the function which assigns to each circle the number which 
is the circle’s cireumference. The domain of this function is the set 
of all circles. For any element (circle) in the domain, the value of 
this function is the number 2rr, where r is the radius of the circle. 
For yet another example, consider the function whose domain is the 
Set of all people to whom the federal tax laws apply, and which as- 
Signs to each such person the number which is his taxable income for 
a given year. 

In probability theory, certain functions of special interest are 
given special names. 

Definition 1.1. A function whose domain is a sample space and 
Whose range is some set of real numbers is called a random variable. 
If the random variable is denoted by X and has the sample space 

) = {01, 02, «++, 0,) as domain, then we write X(o;) for the value of 
X at the element op. Thus X (o) is the real number that the function 
Tule assigns to the element o; of S. 

The reader may find it helpful to think of a function in terms of à 


machine, In Figure 17, we picture a ma- 
chine for the random variable X. The 


H à - f Input o, eS 
Possible inputs of the function-machine cu 
Fo the elements of the sample space S. 

ach such element o, is “processed” by " 


© machine, and what emerges is the 
putput number X (o;). The set of all possi- 
* Input elements is the sample space Output X(o;) 
, the domain of the function. The set 
of different output numbers is the range 
of the random variable X. 
et us now look at some examples of random variables. 


Figure 17 


Example 1.1. Let S = (1,2, 3, 4,5, 6} and define X as follows: 
X1) = X(2) = X(3) = +1, X(4) =X) = X@) = -1. 


Then X is a random variable whose domain is the sample space S and 
98e range is the set (1, —1}. X can be interpreted as the gain of a 
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player in a game in which a die is rolled, the player winning $1 if the 
outcome is 1, 2, or 3 and losing $1 if the outcome is 4, 5, or 6. 


Exemple 1.9. Two diee are rolled and we define the familiar 
sample space 
S = (0,0, (1, 2), «++, (6,6) 
containing 36 elements. Let X denote the random variable whose 
value for any element of S is the sum of the numbers on the two dice. 
Then the range of X is the set containing the 11 values of X: 


2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12. 
Each ordered pair of S has associated with it exactly one element of 
the range, as required by Definition 1.1. But, in general, the same 


value of X arises from many different outcomes. For example, 
X (0x) = 5 if o; is any one of the four elements of the event 


{(1, 4), (2, 3), (3, 2), (4, D}. 
A given input element in Figure 17 always leads to exactly one out- 


put number, but the same output number may be obtained from 
more than one input element. 


Example 1.3. A coin is tossed, and then tossed again. We define 
the sample space 


S = {HH, HT, TH, TT). 
If X is the random variable w 


hose value for any element of S is the 
number of heads obtained, th 


en 
X(HH) = 2, X(HT) = X(TH) = 1, X(TT) =0. 


More than one random variable can be defined on the same sample 


Space. For example, let Y denote the random variable whose value 
for any element of S is the number of heads minus the number of 
tails. Then 


YOR) =2, T(HT) = Y(TH) = 0, Y(TT) = —2. 


ai 


Naming a numerical-valued fun 
rand 
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reader will constantly keep in mind its true meaning, as given in 
Definition 1.1. 
Suppose now that a sample space 
S = {o, 02, +++, On} 

is given, and that some acceptable assignment of probabilities has 
been made to the simple events of S. Then if X is a random variable 
defined on S, we can ask for the probability that the value of X is 
Some number, say z. The event that X has the value z is the subset 
of S containing those elements ox for which X (ox) = x. If we denote 
by f(x) the probability of this event, then 


(1.1) f(a) = P({ox e S | X(ox) = 2). 
Because this notation is cumbersome, we shall write 
a2) f(a) = P(X = 2), 


adopting the shorthand “X = z" to denote the event written out in 
(1.1). 


Definition 1.9. The function f whose value for each real number x 
is given by (1.2), or equivalently by (1.1), is called the probability 
function of the random variable X. 

In other words, the probability function of X has the set of all real 
numbers as its domain, and the function assigns to each real number 
* the probability that X has the value a. 


Example 1.4. Continuing Example 1.1, if the die is fair, then 
X) =PZ=1)=} f-D0-PX--0-iÀ 
and f(x) = 0 if z is different from 1 or —1. 
Example 1.5. If both dice of Example 1.2 are fair and the rolls are 


independent, so that each simple event of S has probability 4, then 
We compute the value of the probability function at z — 5 as follows: 


f) = P(X = 5) = P({(1, 4), (2, 3), 8,2, (& DD = 


This is the probability that the sum of the numbers on the dice is 5. 
© can compute the probabilities f(2), f(3), -+ +, f(12) in an analogous 


Fc These values are summarized in the following probability 
€: 
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z 2 3 4 5 6 7 8 9 10 11 12 


f(z) || 1/86 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 1/36 | 


Let us agree, as here, to include in such probability tables only 
those numbers z for which f(z) > 0. Since we include all such num- 
bers, the probabilities f(z) in the table add to 1. From the probability 
table of a random variable X , We can tell at a glance not only the 
various values of X, but also the probability with which each value 
occurs. This information can also be presented graphically, as in, 


f(x) = P(X=x) 


6/36 
5/36 
4/36. 
3/36 
2/36 
1/36 


[^] 12345678 9101112 
Figure 18 


E 


point with coordinates (x, f(z)) is the proba- 
X has the value z. 


of items ordered, you are 
the event X < 10. As an- 
e of votes cast in opposition 
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to a school bond levy, then the school board is interested in the prob- 
ability that the levy is approved, i.e., that X will be less than or 
equal to some critieal number that separates vietory from defeat. 

In general, if X is defined on the sample space S, then the event 
that X is less than or equal to some number, say z, is the subset of S 
containing those elements o; for which X(o;) < x. If we denote by 
F(x) the probability of this event (assuming an acceptable assignment 
of probabilities has been made to the simple events of S), then 


(1.3) F(x) = P( {ore S| X(o) € z}). 

In analogy with our agreement in (1.2), we adopt the shorthand 
"X < x” to denote the event written out in (1.3), and we then can 
Write 

(1.4) F(z) = P(X € 2). 


Definition 1.3. The function F whose value for each real number 
? is given by (1.4), or equivalently by (1.3), is called the distribution 
Junction of the random variable X. 


In other words, the distribution function of X has the set of all 
real numbers as its domain, and the function assigns to each real 
number z the probability that X has a value less than or equal to 
(ie., at most) the number z. 

As our next example illustrates, it is an easy matter to calculate 
the values of F, the distribution function of a random variable X, 
When one knows f, the probability function of X. The distribution 


bein can be presented in graphical or tabular form, as we also 
Show, 


Example 1.6. Let us continue with the dice experiment of Ex- 
ample 1.5. The event symbolized by X < 1 is the null event of the 
Sample space S, since the sum of the numbers on the dice cannot be 
at most 1. Hence 

F(1 = P(X <1) =0. 
The event x < 2 is the subset ((1, 1)}, which is the same as the 
event XY = 2. Thus, 


F(2) = P(X < 2) =f) = ṣe- 


The event X < 8 is the subset {(1, 1), (1, 2), (2, 1)}, which is seen to 
© the union of the events X = 2 and X = 3. Hence, 
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F(8) = P(X <8) = P(X = 2) + P(X = 3) 
= f(2) + f) 
= de tas = e 
Similarly, the event X < 4 is the union of the events X = 2, X = 3, 
and X = 4, so that 
F(4) = P(X < 4) = P(X = 2) + P(X = 3) + P(X = 4) 
= f2) + f(3) + f@ 


TA A = de 


Continuing in this way, we obtain the entries in the following dis- 
tribution table for the random variable X: 


3 | 4 5 6 € 8 9 10 11 


(1.5) 


F (z)||1/36|3/36|6/36|10/36|15/36 


21/36|26/36|30/36|33/36|35/36|36/36 


But remember that the domain of the distribution function F is 
the set of all real numbers. Hence, we must find the value F (x) for 
all numbers z, not just those in the distribution table. For example, 
to find F(2.6) we note that the event X < 2.6 is the subset {(1, 1)); 


since the sum of the numbers on the dice is less than or equal to 2.6 
if and only if the sum is exactly 2. Therefore, 


F(2.6) = P(X < 2.6) = a. 


In fact, F(z) = 4 for all z in the interval 2 < x < 3, since for any 
such x the event X <x is the same subset, namely {(1, 1)}. Note 
that this interval contains z = 2, since (2) = zis, but does not con- 
tain x = 3, since F(3) = vv. Thus, F(x) = à. for x = 2.999- - *, no 
matter how many nines we write down, but at z = 3, the value of 
F Jumps to F(3) = $}. Similarly, we find F(x) = 4& for all x in the 
interval 3 < z < 4, but a jump occurs at x = 4, since F(4) = dw 
then F(x) = 4& for all x in the interval 4 < x <5, but a jump 
occurs at 2 = 5, since F(5) = 3; ete. 

These facts are shown on the graph of the distribution function in 
Figure 19. The graph consists entirely of horizontal line segments. 
(A funetion having such a graph is appropriately called a slep func- 
tion.) We use a heavy dot in Figure 19 to indicate which of the twe 
horizontal Segments should be read at each jump (step) in the graph. 
Note that the magnitude of the Jump atx = 2 is f(2) = 4, the jump 
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O 1234567 89101112 * 
Figure 19 


at x = 3 is f(3) = 44, the jump at z = 4 is f(4) = s ete. Finally, 
Since the sum of the numbers on the dice is never less than 2 and 
always at most 12, we have F(x) = 0 if x «2 and F(x) = 1 if 
t 2 12, 


If one knows the height of the graph of F at all points where jumps 
occur, then the entire graph of F is easily drawn. It is for this reason 
that we shall, as in (1.5), always list in the distribution table only 
those z-values at which jumps of F occur. 

If we are given the graph of the distribution function F of a random 
Variable X, then reading its height at any number z, we find F(x), 
the probability that the value of X is less than or equal to x. Also, 
We can determine the places where jumps in the graph oceur, as well 
as the magnitude of each jump, and so we can construct the proba- 
bility function of X. Thus, we can obtain the probability function 

Tom the distribution function, or vice versa. 

We have made our observations up to this point on the basis of 
Some special examples, especially the two-dice example. We now 
turn to some general statements that apply to all probability and 


distribution functions of random variables defined on finite sample 
Spaces, 


Theorem 1.1. If a finite sample space S is given, if an acceptable 
assignment of probabilities is made to its simple events, and if a 
pt variable X is defined with domain S, then f, the probability 
unction of X, has the following properties: 
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G) f(x) = 0 for all z, but there are at most a finite number, say N, 
of z-values for which f(z) > 0. 


(i) If x, zs, +++, zy are all the z-values for which f(x) is positive, 
ie., 


(1.6) J@)>0 fork =1,2,---,N, 
then 

N 
gn > fe) - 1. 

ia 


We leave the proof of this theorem for the problems. 


The probability table of the random variable X thus has the fol- 
lowing form: 


(1.8) 


Under these circumstances, it is customary to say that, X is a ran- 
dom variable whose possible values are Ti Tə, +++, xy, and that the 
value 2, occurs with probability f(x.) for k = 1, 2, --., N. This lan- 
guage, which we shall use from now on, should bring to the reader’s 
ud the probability table in (1.8), whose entries satisfy (1.6) and 


Theorem 1.2. With th 
tion function F of the r: 
ties: 


e hypotheses of Theorem 1.1, the distribu- 
andom variable X has the following proper- 


Proof. To prove (i), we let m be the smallest and M the largest of 


the possible values 2, at», .. *; ty of the random variable X. Then if 
* <m, the event X S z is the null Set Ø, and so 


F(z) = P(X S2) = P() — 0. 
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On the other hand, if z 7 M, then the event X < z is the entire 
sample space S, and so 


F(x) = P(X <2) = P(S) = 1. 


To prove (ii), note that if x > y, then the event X < y is a subset 
of the event X < x. Hence, by Theorem IL4.2, we have 


PX < y) < P(X <2), 


Which was to be proved. 

To prove (iii), suppose z; and z; are neighboring x-values, with 
v; < xr. Then the event X X x is the same for all x in the interval 
Ti S x «x. Hence, F(x) is the same number for all such x, and the 
graph of F is a horizontal line segment in this interval. At the right- 
hand endpoint of the interval, we find 


P(X € a) = P(X €x) + P(X = x), 
So that the jump at x = z; is 
P(X < m) — P(X < mi) = P(X =m) = fim), 
4s claimed. 


. In Theorems 1.1 and 1.2, we stated our hypotheses very carefully 
In order to make clear that one must have a sample space, an accept- 
able assignment of probabilities to its simple events, and a random 
variable defined on the sample space, before talking about the prob- 
ability function or the distribution function of the random variable. 
Nevertheless, in the probability literature (and starting in the next 
Section in this book) one often sees definitions and theorems that 
begin With the words, “Let X be a random variable with probability 
unction f," or “Let X be a random variable whose possible values 
Vy 2», ++. æy occur with probabilities f(zi), f(w2), «++, f(a), respec- 
tively,” no mention being made of a sample space or assignment of 
Probabilities to simple events. To understand why this is an accept- 
able state of affairs, we must first realize that the converse of Theorem 
11 is true, 

Theorem 1.3. Let a function f with properties (i) and (ii) in 

heorem 1.1 be given. Then there is a finite sample space S, an ac- 
ceptable assignment of probabilities to the simple events of S, and 


3 random variable X whose domain is S, such that f is the probability 
function of X. 


Proof. Define S = {r zs «++, zy} and let P((m]) = f(e) for 


168 RANDOM VARIABLES / Chap. 4 


k = 1, 2, ---, N. This is an acceptable assignment of probabilities, 
because we have assumed that (1.6) and (1.7) hold. Now define the 
random variable X as the identity function on S; i.e., to each element 
T+ € S we assign the number X(r;) = z;. Then 


P(X = ax) = P((z)) = f(x) for k = 1,2,++-,N 
and 


P(X = 2) = PG) = 0 


if x is not one of the numbers Tı, Yo, +++, ty. Hence f is the proba- 
bility function of X, and the theorem is proved. 


We also must realize that infinitely many different random vari- 
ables can have the same probability function. (See Problem 1.13.) 
This possibility leads to the following oft-used terminology. 


Definition 1.4. Two or more random variables are said to be iden- 
tically distributed if and only if they have equal (i.e., identical) prob- 
ability functions (and hence identical distribution functions). 


Thus, whenever we want to make definitions or prove theorems 
that depend only on the probability function of a random variable, 
we are actually making definitions and proving theorems for any one 
of an infinite set of identically distributed random variables. The 


an serve as the prototype of all identically 
ith the given probability function. 
tves to explain why, in the following 
ion of the underlying sample spaces 
simple events, when we talk of ran- 
lity and distribution functions. 


PROBLEMS 


11. An experiment consists of three independent tosses of a fair coin. Let X 
be the random variable Whose value for any outcome is the number of 
heads obtained. 
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(a) Find the probability function of X, and construct a probability 
table and a probability chart. 
(b) Find the distribution function of X and draw its graph. 


1.2. Repeat the preceding problem, but now (a) let X be the random vari- 
able whose value for any outcome is the number of heads minus the 
number of tails, (b) let X denote the gain of a player who wins $2 if 
the first head occurs at the first toss, wins $1 if the first head occurs 
at the second toss, loses 81 if the first head occurs at the third toss, 
and loses 82 if all three tosses are tails. 


1.3. There are two defectives in a lot of eight articles. A sample of four 
articles is drawn at random (without replacement) from the lot. Let 
X denote the number of defectives in the sample. 


(a) Determine the probability function of X and construct a prob- 
ability table. . 
(b) Determine the distribution function of X and draw its graph. 


14. The annual income of six people A, B, C, D, E, F is given in the follow- 
ing table. A committee of k people is selected from these six people, 


Person 


Income (in $1000's) 3 3 4 5 6 6 


Where k = 1, 2, ---, 6, and the random variable X; is defined as the 
average income of the k committee members. For each value of k: 
(a) determine the probability function of X; and construct the cor- 
responding probability chart; (b) determine the distribution function 
of X, and draw its graph. (c) Compare the six probability functions, 
Noting especially how they change as k increases. 


1.5. The random variable X has a probability function f of the following 
form, where i is some number: 
k ifz=0 
|j] % ife=1 
fo-ls itr =2 
0 otherwise. 


(a) Determine the value of k. 

(b) Find P(X < 2), P(X < 2), PO < X < 2). 

(c) What is the smallest value of x for which P(X < z) > .5? 
(d) Determine the distribution function of X. 
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1.8. Let X denote the number of hours you study during a randomly 


LT. 


18. 


1.9. 


selected school day. Suppose the probability function of X has the 
following form, where & is some number: 


Ei ifz=0 

kx ifs = lor2 
clan i ee uchad 

0 otherwise. 


(a) Find the value of k. 

(b) Draw the probability chart. 

(c) What is the probability that you study at least two hours? Exactly 
two hours? At most two hours? 

(d) What number of hours is such that you study at least this number 
of hours with probability at least .70? 

(e) Determine the distribution function of X. 

(f) What is the conditional probability that you study three hours, 
given that you do study? 


The distribution function of a random variable X is given as follows: 


0  ifz«-1 

i if-1£z«1 
F@)=4 4 ifl<r<2 

$ f2<r<8 

l es 5. 


(a) Draw the graph of F. 


(b) Find P(X <1), P(X = 1), P(-1 < x <2), P(—1 < X <2), 


P(-1<X <2), P(x <3), P(—2 < X < 3.5), P(1.5 «X«27 
(c) Determine the probabili: 


ty function of X, and construct a prob- 
ability table. 


An urn contains three 
function and construct 
random variables, 


green and two red balls. Find the probability 
the probability table for each of the following 


(a) The number of red balls in a random sample of three balls drawn 
with replacement. 

(b) The number of red balls in a rand 
without replacement. 

(c) The number of balls that are 
in order to get a red ball. 


(d) The number of balls that are drawn (one by one, without replace- 
ment) in order to get a red ball, 


A bridge hand is dealt from a full deck. Let X denote the number of 
spades in the hand. Determine the probability function of the random 
variable X, 


om sample of three balls drawn 


drawn (one by one, with replacement) 
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1.10. 


141. 


1.12. 


1.13. 


144, 


Let X be a random variable whose possible values x, zs, «++, zy occur 
with probabilities f(zi), f(z2), +++, f(zx), respectively. If F is the dis- 
tribution function of X, show that 
Fe) = X fed, 

mSz 
where the sum is taken over all k-values for which r} < x. (Because 
values of F are obtained by successive additions of f-values, P is often 
called the cumulative distribution function of X.) 


Let f and F be the probability and distribution function, respectively, 

of a random variable X. Show that for any numbers a and b (a < 0), 

(a) P(a < X <b) = Fb) — F(a) 

(b) P(a < X <b) = F(b) — F(a) + f(a) 

(c) Pla < X <b) = F(b) — F(a) + f(a) — fb) 

(d) P(a < X < b) = F(b) — F(a) — fb) 

(a) Let zi, zs, «++, zy be the possible values of a random variable X 
defined on a sample space S. Show that if no simple event of S is 
assigned zero probability, then 


{X =m, X = ta +++, X = ay} 


is a partition of S. 
(b) Prove Theorem 1.1. 


A fair coin is tossed and you win $2 if it falls heads, win $1 if it falls 
tails. Call your gain X;. A fair die is rolled and you win $2 if a 1, 2, 
or 3 shows, win $1 if a 4, 5, or 6 shows. Call your gain X». 

(a) Show that X, and X» are different random variables, but have the 
same probability function. (Note: Two functions are equal if and 
only if they have the same domain and the same value for cach 
clement in their common domain.) 

(b) Show that there are infinitely many different random variables that 
have the same probability function as X;. 


Let S = (0, 0s, +++, on} and let E be any event of S. Define Xz, the 
characteristic random variable of event E, as follows: 
7 Ja if ope E 
Xs(o) = { [U otherwise. 
In other words, Xs is equal to 1 if E occurs, and Xz is equal to 0 if E 


does not occur. Prove the following properties of characteristic random 
variables: 


(a) Xø is identically 0; i.e., Xg(o;) = 0 fork = 1,2, +++, n. 
(b) Xsis identically 1; i.e., Xs(o;) = 1 fork = 1,2, +--+, m. 
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(c) cede F, Cs in Eod ue oe (To say Xz = Xr means 
AEO) = Arlo = 1,4, 1. : " 
(d) If E SF, then Er. et = ee s (To say Xe € Xr 
(e) rents bs " (The value of Xe + Xr at o, is defined 
to be Xz(o;) + X i(;).) 


€) Xenr = XeXr. (The value of XeXp at o; is defined to be 
Xz(o1) X e(0;).) 
(g Xzur = Xe + Xp — Xznr. 


2. The mean of a random variable 


In many problems, the random variable under study has a des 
complicated probability function. It is therefore desirable to be able 
to describe some features of the random variable by means of a few 
numbers that can be computed from its probability function. For 
Some purposes, these numbers, rather than the entire function, are 
all that is needed. In this section, we concentrate on a number, 
called the mean of the random variable, that is a measure of location 


in the sense that it roughly locates a “middle” or "average" value of 
the random variable. 


There are other often-used measures of location, in particular, the 


f a random variable. But these are of lesser 
ean, and so we ask the interested reader to 
e problems. (See Problems 2.21-2.22.) 


Let X be a random variable whose possible values 
Ui, Yo, +++, ay occur with probabilities f(a,), F), +++, f(ay), respec- 
tively. The mean of X, denoted by E(X), is the number 


importance than the m 
learn about them in th 


Definition 9.1, 


x 
(2.1) EX) = X afte; 
ie., the mean of X is 


the weighted avera 
each value being wei 


ge of the possible values of X, 
£hted by the pro 


bability with which it occurs. 
Let us note that the concept of Weighted average is a familiar one. 
hen a student computes his average grade in a course in which his 


SIX grades are 75, 90, 75, 87, 75, and 90, he divides the sum of all his 
grades by the total number of grades: 


78 +90 + 75 + 87 +75 4 99 
6 = BE 
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But he can also write 


75 +90 -- 75-87 -- 75 4-90 — 75(3) + 87(1) + 90(2) 
E 6 


= 75 (8) + 87(5) +00 (2) 


from which we sce that the average grade is the weighted average of 
the student’s grades, each grade having as weight the proportion (or 
relative frequency) with which it occurs among all the grades. 

The choice of the letter Z to denote a mean is due to the fact that 
the concept of mean was first introduced with reference to games of 
chance, where the mean of the gain of a player is called his mathe- 
matical expectation. The mean is also called the expected value of X, 
but it is important to realize that this term is misleading, since the 
mean is not a value that we expect the random variable to assume. 
In fact, E(X) can be different from all the possible values of the ran- 
dom variable X. , as the following example shows. We should not be 
Surprised by this, since it occurs in the illustration concerning grades, 
Where the average grade was different from all the actual grades. 


Example 9.1. Let X denote the number of points obtained in a 
throw of a fair die. Then the possible values of X are 1, 2, -+ +, 6, and 
each occurs with probability $. Hence, applying (2.1), 


E(X) = 10) + 26) + 30) + 400 + 5) + 6@) = E 


Example 2.9. Let X denote the sum of the numbers ou two fair 
ice. In Example 1.5, we computed the probability function of the 
random variable X. Now we find 


BCX) = 20) + 36) + AG) + (4) + Gs) + TG) 
EU BG) + gle) + 1068) + 1168) + 126), 
or 


E(X) = 7. 


Example 2.3. A florist stocks a perishable flower which costs him 
5 Cents and which he prices at $1.50 on the first day it is in his shop. 
Any flowers not sold that first day are worthless and are thrown 
away. Let x be the random variable denoting the number of flowers 
at customers order on a randomly selected day. The florist has 
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found that the probability function of X is given by the following 
probability table: 


z 0 1 2 3 


f(z) a | &] 2] 2 


How many flowers should the florist stock in order to maximize the 
mean (or expected value) of his net profit? e 

Fork = 0,1, 2, 3, let Y; be the random variable denoting the florist 8 
net profit when he stocks I; flowers. We determine the probability 
function of each of these random variables, compute Z(Y;) for each, 
and thus determine the value of k for which E(Y;) is largest. 

If he stocks no flowers, then his profit Y, is equal to zero with pne 
ability 1, and so E(Y;) = 0. If he stocks one flower, then he loses 50 
cents if no flowers are ordered and makes a net profit of 150 — 50 = 
100 cents if at least one customer orders a flower. Hence the proba- 
bility function of Y; is given by the table 


n —50 
—— J | 
P(Yi = y) B 
and so 
E(Y) = —50(1) + 100(.9) = 85 cents. 
Let the reader check that, the probability tables of Y; and Y; are 
given by 


a —100 | 50 | 200 0 | 150 | 300 


PY: = ys) X |à 


5 


P(Y: = ys) 


from which w 
Thus the flor 
flowers, (See 


e compute E(Y;) 
ist maximizes hi 
Problem 2.7.) 


— 110 cents and E(Y;) = 90 cents. 
S mean net profit by Stocking two 


If x is a random variable defined on a sample space S and 2; is a 
Possible value of y » then it often happens that we are interested less 
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in x+ itself than in some number determined by z;, like 5;, 3v; — 2, 
vi, ete. (As a simple example, if X is the number of units demanded 
of a product that sells for $7 and costs $2 per unit, then the demand 
x, determines the profit 5x4.) In such cases, we must first understand 
that à new random variable is determined. Then we can turn to 
methods of calculating the mean of this new random variable. 


Definition 2.2. Let X be a random variable defined on the sample 
Space S, and suppose g is a numerical-valued funetion whose domain 
includes the range of X. Then the composite function of g with X, 
denoted by g(X), is defined as the function whose value for any ele- 
ment o; e S is the real number g(X (0x). 

Let us review this definition in terms of the function machines in 
Figure 20. We start with any element o; e S, and first obtain the 


Input o, €S 


Output 
&(X(o4)) = Y(o4) 


Output X(o) 
becomes Input X(o,) 


Figure 20 


value X(o,). Now we have assumed in Definition 2.2 that any out- 
put number of the X-machine can serve as an input number of the 
g-machine. In particular, we can use X (o;) as input for the g-machine, 
and thereby obtain the value of g at X(o;). This final output number 
1s therefore g(X(ox)). The two machines taken together in the given 
order can be thought of as one composite machine that takes the 
Input element oz eS and produces the output number g(X(ox)). The 
composite function g(X) defined by this composite machine is there- 
fore a random variable whose domain is S. If we let Y = g(X), then 
the random variable Y assigns to each element 0; ¢S the number 
Y(o) = g(X(o,)). 
These ideas are illustrated in the next example. 


Example 2.4. Suppose we toss two fair coins and take S = 
(HH, HT, TH, TT} as sample space. Let X denote the number of 
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heads obtained and let us take g(x) = (x — 1)2, so that Y= gf » 
(X — 1}. In words, Y is the Square of the deviation of the IM 
of heads from 1. For each element o; e S, we obtain the value o 


random variable X and then find the corresponding value of the 
random variable Y: 


Ok P({ox}) X (ox) Y (ox) 

HH 1/4 2 (2-1)? =1 
HT 1/4 1 (1-1)? =0 
TH 1/4 1 a= 1}? =0 
TT 1/4 0 @-1)2=1 


We thus obtain the probability functions of X and Y: 


1 


1/4 1/2 1/4 


P(Y =y) 1/2 | 1/2 


And now we can compute the mean of Y by applying Definition 2.1. 
We find 


(2.2) EY) = BUX — 1] =0. 


2 
$4+1-3=3. 


Our next result shows that we can compute the mean of Y = g(X) 
directly from the probability function of X without first finding the 
probability function of F; 


Theorem 2.1. Let X be a random v 


ariable whose possible values 
Ti, Ta, +++, £y occur with probabilities Fa), fe), ++ -, fx), respec- 
tively. If Y = 9(X), then the mean of the random variable Y is 
given by 

re 
(2.3) E(Y) = E[g(X)] = 2, g(a.) f (2%). 


Proof. Let the possible y 
that M < y 


alues of Y be Vv Vs +++, Yar. (We know 
different ya} 


» Since it ean happen that Y has the same value for two 
ues of X.) Then by the definition of E(Y) we have 
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M 
EY) = Z yjP(Y = y;) 
E 
M 
= y;P(g(X) = y? 
f= 


Il 


X yj;P(X = x, where g(x) = yj. 

j=l 

Now the probability of the event in this last expression is the sum 
of the probabilities of one or more (disjoint) events of the form 
X = x, where x, is a possible value of X. And for each of these 
events we know that y; = g(®:). As j varies from 1 to M, we include 
terms of the form g(z;)P(X = z;) for each possible value z;. Hence 


M 


N 
E(Y) = X g(ax)P(X = %), 
k=1 
which is precisely what we set out to prove. 


Example 9.5. Let us illustrate the use of (2.3) by computing E(Y) 
for the random variable Y = (X — 1)? in Example 24. We find 


EQ) = EUX - 19] = 2, Ge — De 
0—-)40-034Q0-1231 


ll 
ar 


wl 


, 


as in (2.2). 


Computing Z[g(X)] by means of Formula (2.3) is generally much 
casier than first determining the probability function of the random 
variable g(X) and then computing its mean by using Definition 2.1. 
Tn our later work, for example, we shall use the formulas 


(2.4) E(X?) = El xif (xx), 
(2.5) E[X — E(X)] = 2 [r, — E(X)]f(e), 
(2.6) EUX — BOOP) = E, le - EOTS, 


Which are obtained from (2.3) by putting g(x) in turn equal to z?, 
* — E(X) and [x — E(X)J*. 


Example 9.6. If X denotes the number of points obtained in a roll 
9f a fair die (see Example 2.1), then from (2.4) we find 
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"EG - LO + PO + FQ) + &() + $0) HH — 5g 
Theorem 2.1 leads to a number of important results to which we 
now turn. 


Theorem 2.2. If a and b are any numbers, then 


(2.7) E(aX + b) = aE(X) +b. 
Proof. We let g(x) = ax + b in (2.3) and find 
N 


BU) = BX +0) = S (an + dye) 
N 


N 
E axf(z,) + RJ bf (x) 


ll 


k 
N 
á D tefle) + b Efe) 
kza - 
— aE(X) 4- b, 


the last equality following from the definition of E(X), as given in 


(2.1), together with the fact, expressed in (1.7), that the sum of all 
the probabilities f(x) is 1. 


PL 


As special cases of (2.7), we have 


E(X +b) = 


In words, adding a fixed amount to every value of a random variable 
changes the mean of the random variable b 


E(X)+b and E(aX) = aE(X). 


1s comforting that our formulas yield 
these very appealing and reasonable results, 


If we put a = 1 and b = —E(X) in (2.7), w 


(2.8) E[X — E(X)] = E(X) — E(X) = 0. 

Now X — E(X) denotes the algebraic 

its mean. This deviation is Positive, 

upon whether the value of X is greate; 

the number E(X). Thus, Formula (2. 
n its mean is zero. 


le to give a mechanical interpretation of some of our 
think of N particles distributed along the a-axis at the 
“tts ty. The particle at point z, has mass f(z;) for 


2? “++, N. Then (1.6) and (1.7) express the facts that each 
Particle has positive mass and the total mass of all N particles is 1. 


e obtain 


or signed deviation of X from 
zero, or negative depending 
r than, equal to, or less than 
8) asserts that the mean devi- 
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With this interpretation, the sum in (2.1) defines what the physicist 
calls the center of gravity of the system of N particles. Thus, the center 
of gravity is the weighted average of the z-values, each having weight 
equal to the mass concentrated at that z-value. 

The number z; — E(X) is the signed distance of the particle at x; 
from the center of gravity. If we imagine the z-axis as a lever sus- 
pended on a fulcrum placed at the center of gravity, then z; — E(X) 
is positive if xz is to the right of the fulcrum and negative if x; is to 
the left of the fulcrum. (See Figure 21.) The moment about this ful- 


Mass: fe) fla) fi. fé) * fiy) 
Position: Xi X2 Xa*** E(X) Xp Xy 
x- E(X) 
Figure 21 


crum of the particle of mass f(x;) at 2; is the product of its mass and 
its signed distance from the fulerum. (This signed distance is called 
the moment arm in mechanics.) The total moment (about the ful- 
crum) of the entire system of N particles is therefore precisely the 
sum in (2.5). When this total moment is zero, the lever is in equi- 
librium; ie., it balances and does not turn about the fulerum. For- 
mula (2.8) is therefore merely the expression of the following property 
of the center of gravity: a distribution of mass particles is in equilib- 
rium with respect to motion about a fulerum placed at the center of 
gravity of the system. It is possible to show further that the center 
of gravity is the only location for a fulerum if the lever is to be in 
equilibrium and not turn. (See Problem 2.14.) 

In our concluding example, we find it convenient first to determine 
the distribution function of X, then the probability function, and 
finally E(X). This example therefore serves as a quick review of 
Some of the material presented up to now in this chapter. 


Example 9.7. In a certain city, there are 25 officials with city- 
owned limousines carrying license plates numbered 1, 2, 3, +++, 25, 
In a &-minute period in front of city hall we observe two official cars, 
Let us interpret this as a random sample of two cars drawn with re- 
placement from the population of 25 cars. Let X denote the larger 
license plate number observed. (If the two numbers we observe are 
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the same, then X is just the number observed.) We want to find the 
mean of the random variable X. . . 

"The possible values of X are clearly the integers 1, 2, «++, 29 ts 
event X < k occurs if and only if both license plate numbers are 
than or equal to k. Hence for k = 1, 2, +++ , 25, 


= k? . 
~ 625 


F(k) = P(X < k) = (ay 
But 
fh) = P(X = k) = P(X < k) — P(x < k — 1). 


Therefore for k = 1, 2; = 25. 


PO (k—i1f 95k—1 
N = & Dr 2b 1, 


625 625 625 
Having found the probability function of X , We can use (2.1) to pe 
pute E(X). The values zi, o, +++, ty in (2.1) are now just the in 


tegers 1, 2, --- , 25. We thus find that 


2 PECES 
EQ) = X q) = | -» 


1 25 25 
=for|2 > FE - > i} 
625 [ 2, k=1 
But the sum of the first N Positive integers and the sum of the squares 
of the first N positive integers are given by the formulas 
N N 2 1 
(2.9) à i NO 1) S ye = NV+ i N I) 


b= Ei 
It follows that 


; = L [2 (25)(26)(51) 25)(26)] 429 

BX) = gg 6 = 79 |-3 
Thus the mean of the larger of the two observed license plate numbers 
is 17.16. 


A problem related to this one, 


but considerably more difficult, is 
the following, Suppose you do not 


know how many people are attend- 
ng a convention, but you do know that as each person entered he was 


given an identification tag with a number on it. The tags are num- 
bered serially from 1 to N. » Where N is the unknown number in attend- 
ance. You select a random sampl 


€ of ten people, let us say, and 
observe that the largest number on their badges is 261. What esti- 
mate do you then make of the total attendance at the convention? 
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This is a problem in statistical estimation in which some characteristic 
of a population (the total number of people) is to be estimated on 
the basis of information (the largest of the ten selected badge num- 
bers) obtained from a sample drawn from the population. In Example 
2.7, we did a very simple problem of sampling theory, in which we 
answered a probability question about a sample on the assumption 
that we knew everything about the population from which the sample 
is drawn. 


24. 


2.2. 


2.3. 


2.4, 


2.5. 


2.6. 


2.7, 


PROBLEMS 


Let X denote the number of heads obtained in three independent tosses 

of a fair coin. (See Problem 1.1.) 

(a) Find E(X). 

(b) Determine the probability function of the random variable Y — 
X — E(X) and then verify (2.8) by computing E(Y). 

(c) Determine the probability function of the random variable Z — 
[X — E(X)} and then calculate £(Z). Check your result by also 
using Theorem 2.1 to compute E(Z). 


A coin (perhaps biased) is tossed. Let X denote the number of heads 
obtained. Determine the probability function of the random variable 
Y = X(1 — X). 

A thousand tickets are sold in a lottery in which there is one top prize 
of $500, four prizes of $100 each, and five prizes of $10 each, A ticket 
costs $1. If X is your net gain when you puy one ticket, find E(X). 
In roulette, the wheel has the 37 numbers 0, 1, 2, +--+, 36 marked on 
equally spaced slots. A player bets $1 on a given number. He receives 
$36 from the croupier if the ball comes to rest in this slot; otherwise, 
he gets nothing. If X is the player’s net gain, find E(X). 

Refer to Problem 1.3 and find the mean number of defectives in the 
sample. 

Two defective tubes get mixed up with two good ones. You select and 
test one tube at a time until you have discovered both defectives. Let 
X be the number of tubes selected when the second defective is dis- 
covered. Determine the probability function of X and compute the 
mean of X. (Cf. Problem II.5.9.) 

In Example 2.3, we assumed that there is no loss caused by inability to 
fill a customer’s order. In actual practice, the florist might consider 
that turning a customer away for lack of stock is equivalent to sustain- 
Ing a monetary loss, because customers may give their future business 
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2.8. 


2.9. 


2.10. 
2.11, 


2.12. 


2.13, 


2.14, 


2.15. 
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to another florist, good will is lost, etc. Suppose the florist counts et 
customer turned away for lack of stock as equivalent to a 50 cent ia 
(a) Show that he should still stock two flowers. (b) How oa om 
be the equivalent monetary loss of each customer turned aw ay for 


of stock before the florist maximizes his mean net profit by stocking 
three flowers? 


Player A pays B $1 and two fair dice are rolled. A receives 8 m y 
one six appears, $4 if two sixes appear, and he gets nothing i pw 
appears. Let X denote player A's net gain. (a) Find E(X). Eo Am 
must A pay B as entrance fee (instead of $1) in order to have E(X) = 0? 


Player A bets $1 against B's $b that if two cards are dealt Mes 
Standard deck, both cards will be of the same color. If X is player ; 5 
net gain, what value of b is required to make E(X) = or With me 
value of b so determined, what is E(Y) if Y is player B’s net gain: 


Compute the means of the random variables defined in Problem 1.8. 


Suppose you have convinced a friend to play the following game Te 
you. A fair coin is to be tossed until the first head appears, but i 
game is over if no head appears after 20 tosses. Your friend agrees n 
pay you $2 if a head turns up on the first toss, $2? (= $4) if the firs 
head comes up on the second toss, .- +, $2? (= $1,048,570) if the first 
head comes up on the twentieth toss. You receive nothing if the 20 
tosses yield no head. What entrance fee should you pay your friend 
before the game to make your net gain have mean zero? 

A store sells an item which yields 
is out of stock, customers buy 
store manager notices that ther 
number of items demanded bye 
D such that E(D) — 


a profit of $3 per item. If the item 
elsewhere. At the end of one day, the 
e are only five items left in stock. The 
ustomers each day is a random variable 
12. Assuming additional stock is unavailable, let 
X denote the profit lost due to the manager’s failure to reorder. Find 
E(X). What theorem have you used? 

Let X denote the sum of 
rolled. Find E(X 


Show that 


the numbers obtained when two fair dice are 
°). Is E(X?) = [E(X)]?? (Refer to Example 2.2.) 


if 2 ( — Afla) = 0, then c= E(X). 


In the carnival game known as chuck- 


his entrance fee for playing the game. He selects one number from the 
six numbers 1, 2, .. *, Gand then rolls three dice. If all three dice show 
the number the player selected, the player is paid four times his 
entrance fee; if two of the dice show the number, the player is paid 


a-luck, a player pays an amount 6, 
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2.16. 


247. 


2.18. 


2.19. 


2.20. 


three times his entrance fee; and if only one of the dice shows the 
number, the player receives an amount equal to twice his entrance fee. 
If his number does not show up, then he receives nothing. Let X 
denote the player's net gain in à single play of this game. Assuming 
the dice are fair: (a) determine the probability function of the random 
variable X; (b) compute E(X) and thus show, in particular, that if 
the entrance fee is $1, then the player sustains a mean loss per game 
of about 8 cents. 


After working together on many jobs, four people A, B, C, and D are 
cach asked to write on a slip of paper the name of that person (from 
among his three co-workers) who is most cooperative. Let X denote 
the number of people who are considered most cooperative by none of 
their co-workers. Assuming that each person seleets one of his co- 
workers at random and writes his name on the slip of paper, find E(X). 


A drunk reaches home and wants to open his front door. He has five 
keys on his key chain and tries them one at a time and at random. 
He is alert enough to eliminate unsuccessful keys from subsequent 
selections. Let X denote the number of keys he tries in order to find 


the one that opens his door. Find £(X). 


Refer to Problem 1.4 and show that E(X;) has the same value for 
b= 1, 2, +--+, 6 


An urn has ten balls, numbered from 1 to 10. You are offered the 

following options: 

(1) Pay $1, draw a ball from the urn, and be paid a number of dollars 
equal to the number on the ball. 

(2) Pay $1, draw a ball from the urn. If the number on the ball is 
greater than 5, then be paid a number of dollars equal to the num- 
ber on the ball. If the number on the ball is 5 or less, then put 
the ball back in the urn, pay $3, draw another ball from the urn 
and be paid a number of dollars equal to the number on the ball. 


Let X, and X; be your net profit when you accept options 1 and 2, 
respectively. (a) Determine the probability functions of X; and X». 
(b) If you want to maximize your mean net profit, which option 


do you accept? 


(à) One number is selected at random from the first ten positive in- 
tegers. Let X denote the number obtained. Find E(X). 

(b) Two numbers are selected at random (with replacement) from the 
first ten positive integers. Let X denote the larger of the two 
numbers obtained. Find E(X)- 
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(c) Three numbers are selected at random (with replacement) from the 
first ten positive integers. Let X denote the largest of the three 
numbers obtained. Find E(X). . 

(d) Redo Parts (b) and (c), assuming the numbers are selected without 
replacement. 


2.21. Let X be a random variable with distribution function P. As we know, 
the graph of F isa step function. Imagine vertical line segments drawn 
connecting the lower and upper pieces of the graph at zi, zs, +++, ty 
(where jumps occur) and call the new graph the extended graph of F. 
Select any probability p on the vertical axis and consider the horizontal 
line at this height. The z-coordinate of any point where this horizontal 
line intersects the extended graph of F is called a 100pth percentile of 
the random variable X. A 25th percentile is called a lower quartile; 
a 75th percentile is an wpper quartile; a 50th percentile is a median of X. 


(a) In terms of the construction just described, state when there is à 
unique 100p-th percentile and when there are more than one. 

(b) Show that a median of X can equivalently be defined as any num- 
ber m such that P(X < m) > 4 and P(X > m) > 3. Formulate 
a corresponding definition for a 100p-th percentile of X. 

(c) Let X denote the sum of the numbers on two fair dice. Show that 
the median of X is 7, the lower quartile is 5, and the upper quartile 
is 9. (Cf. Example 2.2.) 

(d) Consider the random variable X defined in Example 2.3. Show that 


E(X) — 1.6. Also show that any number between 1 and 2 inclusive 
is a median of X. 


2.22. A possible value of X that occurs with a probability at least as large 


as the probability of any other value of X is said to be a mode (or 
modal value) of the random variable X. 


(a) Let X be the sum of the numbers on two fair dice. Show that the 


mode of X is 7 so that this random variable happens to have its 
mean, median, and mode all equal. 


(b) A and B match pennies four times. On each match A wins one 
penny with probability 4 


z and loses one penny with probability 5. 
Let X denote the number of times during the course of the game 
that A is ahead. Find the mean, median, and mode of the random 
variable X. 


2.23. Suppose the probability function fis symmetrical about the line z = a, 
Le, f(a + £) = f(a — z) for all x. Show that E(X) = a. 
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3. The variance and standard deviation of a random variable 


The mean of a random variable X is an “average” value of X; it 
gives us no information about the variability of the values of X. For 
many purposes, we also require a measure of this variability, of the 
"spread" or "dispersion" of the values of the random variable. This 
requirement is especially apparent as soon as one realizes that ran- 
dom variables with different probability functions can have equal 
means. For example, we have tabulated below the probability func- 
tions of four different random variables. 


(3.1) X: E(X) = 2.75 
(3.2) Xn E(X:) = 2.75 
(3.3) Xs E(Xs) = 5.75 
(3.4) ds E(X) = 5.50 


y charts are drawn in Figure 22. 
X; have equal means. To dis- 
ave a measure of the extent 


The corresponding probabilit 

The reader can check that X; and 
tinguish X, from X» requires that we h 
to which the values of the random variables spread out along the 
horizontal axis. We would certainly expect of such a measure that it 
Would be larger for X; than for X., reflecting the fact that graph (b) 
is more spread out than graph (a), in Figure 22. 

The random variable Xs is obtained by adding 3 to each value of 
Aie, X; = X, +3. As we showed in the preceding section, the 
Mean is thereby increased by 3. A glance at graphs (a) and (c) in 
Figure 22 shows they are identical, except that graph (c) is 3 units 
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further along the z-axis. But the graphs show the same variability 
in the values of X; and X;, and we therefore expect our measure of 
dispersion to be equal for X; and X;. 


fi lx) f(x) 


3/8 3/8 
2/8 2/8 
1/8 1/8 | | 
0 i — 
) 


[t] 
-1012345678x 1012345678 x 


@ © 
F(x) f,@) 
3/8 3/8 
2/8 2/8 
1/8 1/8 
0 0 

“1012345678 -1012345678 x 

(c) (d) 

Figure 22 


On the other hand, X, is obtained by multiplying each value of 


X: by 2; ie, X, = 2X,. The mean is thereby also doubled, but now 
graph (d) is more spread out on the axis than graph (a) in Figure 22. 
We therefore expect our measure of dispersion to be larger for X: 
than for Xi. Graphs (b) and (d) are harder to compare by eye, and 
it is clear that we must now leave these Special examples and somehow 


obtain a numerical measure of dispersion that will apply to any ran- 
dom variable. 


A first attempt to formulate a 
values of a random variable mi 
central or average value of X, 
of the random variable X , the 
viation of a, from E(X). Comp 
Finally, form the weighted a 
weight for the kth deviation 


precise definition of the spread of the 
ght proceed as follows. Choose some 
say E(X). For each possible value x: 
number z, — E(X) measures the de- 
ute this deviation for k = 1,2, +++, N. 
verage of these deviations, using as 
the probability f(x,) with which the 
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value x; (and hence the deviation x, — E(X)) occurs. We are thus 
led to the number 


(3.5) Fi [z — E(X))f() = E(X — E(X)], 


which, disappointingly, is useless as a measure of spread, since we 
showed in (2.8) that it is zero for all random variables. 

A second attempt would follow the realization that the sum in (3.5) 
is the weighted average of the algebraic or signed deviations x, — E(X) 
and that, after being properly weighted, these deviations, some posi- 
tive and some negative, add to zero. When measuring the spread of 
the values of a random variable, we should be concerned with the 
magnitude of x, — E(X), but not with its sign. In other words, we 
care only about how far a is from the mean, not about whether it is 
less than or greater than the mean. Although not the only way to 
accomplish this (see Problem 3.13), the most mathematically tract- 
able way is to square each deviation and then compute the weighted 
average of these squared deviations. We are thus led to the following 
definition. (To simplify notation, we here introduce the symbol ux 
for the mean of the random variable X. From now on, we use ux 
and E(X) interchangeably for the mean of X.) 

Definition 3.1, Let X be a random variable whose possible values 
Vi £a «++, zy occur with probabilities f(t), f(@2), ^*^, fl (ty), respec- 
tively. Let ux — E(X) be the mean of X. The variance of X, de- 
noted by Var(X) or os, is defined as the number 


(3.6) o& = Var(X) = E[(X — ex)’ 
or equivalently, by (2.6), 

N 2 
(3.7) o& = Var(X) = 2 (xz — ux) f (2). 


The nonnegative number 
(3.8) ox = VVar(X) 


is called the standard deviation of the random variable X. 

Let us use this definition to compute the variances of the random 
variables whose probability functions are given in (3.1)- (3.4). The 
computation of m = E(X:) and gi = Var(Xi) is summarized in Table 


24. (Note that we avoid subscripts on subscripts by writing p and vi 
1n place of uy, and cx,-) 
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TABLE 24 

Tr file) zi) | te—m | (te — m) (te — m) filar) 

1 1/8 1/8 —7/4 49/16 49/128 

2 2/8 4/8 —3/4 9/16 ps 

3 3/8 9/8 1/4 1/16 ns 

4 2/8 8/8 5/4 25/16 50/1 
Sums: 1 22/8 120/128 

a 22 " 
m = E(X) = 2 the) cie 2.75 


120 


log 7.9875 
128 


4 
ei = Var(X;) = PAS — my) = 


n= V Var(Xi) — .97, approx. 


Proceeding in this Way, we compute the following values: 
2 


ot = Var(X1) = .9975 o= 97 
oł = Var(X) = 6.1875 os = 2.49 
og = Var(Xs)= .9375 o= 97 
of = Var(X,) = 3.75 ci = 1.94 


computed. If we take the variance 

Xs shows the largest spread, X, spreads 
less than X, but more than X;, and X, and X; show the same spread. 
ative magnitudes among the standard 
call that X, = 2X. 1 and then note with 
= 2). This is a special case of a general 


spread or dispersion of th 
ness of these concepts in 
our work develops. 

One difficulty with the v. 
persion in the same units a. 
values, then E(X) is a cert; 
ance is the me 


e values of a random variable. The useful- 
the general theory will become apparent as 


ariance is that it does not measure dis- 
s the values of X. Thus, if X has dollar 
ain number of dollars but, since the vari- 
an square deviation, Var(X) is measured in dollars 


Sec. 3 / VARIANCE AND STANDARD DEVIATION 189 
Squared. It is in order to have a measure of dispersion in the same 
units as the values of X that we define the standard deviation as the 
Square root of the variance. 

We turn now to some general results concerning the variance of a 
random variable. Each term in the sum (3.7) that defines Var(X) 
is nonnegative, and so the entire sum is either zero or a positive num- 
ber. The sum is zero if and only if each term in the sum is zero. Since 
f(z) > 0, this means 2, = ux for all k. We have therefore proved 
the following result. 

Theorem 3.1. For any random variable X, we have 
(3.9) Var(X) = 0, 
the equality holding if and only if there is only one possible value of 
X, this value therefore occurring with probability 1. 

Although the calculations summarized in Table 24 are not difficult, 
it would be gratifying to have a simpler way of computing the vari- 
ance of a random variable than by the use of the defining equation 
(3.7). Our next result gives us the required formula. 

Theorem 3.9. The variance of X is obtained by subtracting the 
Square of the mean of X from the mean of X*. In symbols, 


(3.10) Var(X) = E(X?) — uk. 

Proof. We expand the summand in (3.7) and obtain 

N á 

Var(X) = X (a? — Quvae + ux) (tr) 
la X fé) + ok E fen 
= P3 vif (re) — 2ux P wifes) Eaux 2 IG 

The first sum is E(X?) by (2.4), the second sum defines ux, and the 
Probabilities in the third sum add to 1. Hence 

Var(X) = E(X?) — 2uk + wks 
from which (3.10) follows immediately. 

It is important to distinguish clearly between the “mean square” 
E(X?) and the “square mean" pk = [E(X)P in Formula (3.10). 
This formula is usually used to compute Var(X). In Table 25, we 
Summarize the calculations involved in finding the variance of the 
random variable X, whose probability function is given in (3.1). A 
Comparison with Table 24 will show how much simpler it is to use 
(3.10) rather than (3.7) to compute the variance. 
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TABLE 25 


f(z) tifil) xifi nx) 


1/8 1/8 1/8 
2/8 4/8 8/8 
3/8 9/8 27/8 
2/8 8/8 32/8 


1 22/8 68/8 


22 
8 
68 


4 
E(X}) - aif) = 3S 


ej = Var(X)) = E(X1) — ui = 8.5 — (2.75)? = .9375 
cı = V Var(Xi) = .97, approx. 


Example 3.1. If X denotes the number of points obtained in a 
roll of a fair die, then we computed E(X) = 4 in Example 2.1 and 
E(X?) = 21 in Example 2.6. Applying (3.10) we find 


Var(X) = 54 — (2! 2 3$. and ox = V38 = 1.7, approx. 


In Theorem 2.2, we studied the effect on the mean of changing 
each value of a random variable by (1) adding or subtracting a fixed 
number, and (2) multiplying or dividing by a fixed number. Changes 
of the first kind are known as changes in the location of the origin on 
the horizontal axis of the probability chart of the random variable; 
changes of the second kind are known as changes in the scale on this 
axis. For example, in Figure 22 the graph of X; in (c) can be obtained 
from the graph of X, in (a) by a change in location of the origin: if 
we shift the number 0 (and all other numbers) three units to the left, 
then with this relabeling of the axis graph (a) becomes graph (c). But 
the graph of X, in (d) is obtained from the graph of X, in (a) by a 
change in scale: if we make each unit on the axis in (a) two units, 
then With this relabeling graph (a) becomes graph (d). It is as if the 
axis In (a) measured the values of X; in units of quarts, let us say, 
whereas the axis in (d) measured the same variable in units of pints. 
In the following theorem, we study the effect on the variance and the 
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standard deviation of changes in location of the origin and changes 
in scale. 
Theorem 3.3. If a and b are any numbers, then 
(3.11) Var(aX + b) = à? Var(X), 
(3.12) Saxa = |a | ox. 
Proof. If we apply the defining equation (5.6) to the random vari- 
able aX + b, then we find 
Var(aX + b) = E([aX +b — E(aX + 0p). 
By (2.7), this simplifies as follows: 
Var(aX + b) = E([aX +b — aE(X) — o]? 

= E@[X — EQOT) 

-gE(X — E(X)]) 

= ga? Var(X). 
Hence 

Oox4b = VVar(aX +b) 
= Væ v Var(X) 
= |a| ox, 

Where | a | denotes the absolute value of the number a. (Note that 
When we write the square root sign, then by definition we mean the 
orrect to replace Va? by a. 


nonnegative square root. Hence it is not € ^ 
0;ie, Va? = |a|.) 


Instead, Va? = aif a > 0 and V@ = —aifa < 
From (3.11) and (3.12) we conclude that 
Var(X +b) = Var(X), OX4b = OX; 
and 
Var(aX) = à? Var(X), | vx = |a | ox. 
y value of a random variable has 


ation of the random variable, 
ble by the same factor a mul- 


In Words, adding a fixed amount to ever. 
no effect on the variance and the standard devi 
but multiplying each value of a random varia 
tiplies the variance by a? and the standard deviation by | a |. i 
Because of its importance in our later work, we state the following 
Special case of Theorem 3.3. The proof is left for the problems. 


Theorem 3.4. Let X be any random variable with 
Standard deviation ex > 0. Let the random variable 
as follows: 


mean ux and 
X* be defined 
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(2.13) a 

(X* is called the standardized random variable corresponding to X.) 

Then 

(3.14) E(X*) =0 and Var(X*) = 1; 


i.e., the standardized random variable has mean 0 and standard de- 
viation 1. 


Example 3.9. In a manufactured lot, there is a proportion p of 
defective items. An item is chosen at random from the lot. Let X 
have the value 1 if the selected item is defective, and 0 otherwise. 
Thus the possible values of X are 1 and 0, and these occur with prob- 
ability p and q = 1 — p, respectively. Hence 

ux=1-p+0-q=p, 
EX) — -p+0-q=p, 


ok = E(X?) — uk = p — p? = p(1— p) = pq, 
and the standardized random variable corresponding to X is 
< _X-p 
X a T 
V pq 


Example 3.3. To each value 
sponds a value of the corres 
vice versa. If the value of X 
ing standard score. To inter 
for X and obtain 


of a random variable X there corre- 
ponding standardized variable X*, and 
is a test score, then X* is the correspond- 
pret the standard score, we solve (3.13) 


A= ux + X*gy. 
Thus, if we are told that the value of the standard score X* is some 
number, say z, then the corresponding value of the actual score X is 
2 standard deviations removed from the mean, being above the mean 
if z > 0 and below if z X 0. A standard score of +2 means an actual 


Score of 2 standard deviations above the mean score, ete. 


The following theorem, due to the Russian mathematician P. L. 
Chebyshey (1821-1894), gives us further insight into the significance 


of the standard deviation as a measure of the dispersion of the values 
of a random variable about the mean. 


Theorem 3.5. Let X be a random variable with mean ux and 
standard deviation ex > 0. Let c be any positive number. Then the 
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probability that a value of X occurs that differs from ux by more 
than c is less than oX/c?. In symbols, 


(3.15) PX — us| > 0 < 


ye 


| 


Proof. We start with Formula (3.7) for the variance of X: 
N 
ox = P (xx — nx) f). 


Since each term of this sum is nonnegative, omitting some terms can- 
not inerease the value of the sum. Therefore, if we delete all terms 
(if any) for which |x — ux| € c, we obtain 
o% > E* (uu — wx) f(t) 
k 


where the asterisk indicates that the summation extends only over 
those k for which |z — ux| > c. It follows that we further decrease 


this sum if we replace each |x — ux| by c; ie; 
ok > D* efla) = e z* f). 
r^ : 


But 
D* fa) = Z5P(X3)-PX-— ux| > o). 
k k 


Hence 
o% > ePX — ux| > o 


and the result follows by dividing both sides by c. 


From Formula (3.15) we see that with c fixed, the smaller the vari- 
ance of X, the lower the probability that à value of x occurs that 
deviates from ux by more than c. Thus the variance 1n this sense 
controls the spread or dispersion of the values of the random variable 
X about the mean. To be somewhat more precise, it is convenient to 
Obtain an alternate form of (3.15) by substituting zex for c. One 


thus obtains 


1 
(3.16) P(X-uxd»23)€ 


or equivalently, 


P P(X -ul $22 517 5 
more succinctly if we introduce the 


Formula (3.17) can be written eine o 1 
Standardized random variable X* defined in Theorem 3.4. We obtain 


194 RANDOM VARIABLES / Chap. + 
‘ 1 
(3.18) P(X*|€221— 2 


Formulas (3.15)-(3.18) are alternate forms of Chebyshev's I nequality. 

If 0 <z <1, then the inequality does not yield any useful infor- 
mation. For then 1/2 > 1, and (3.16) merely asserts the obvious 
fact that a probability is less than a number greater than 1. 

But if z > 1, then Chebyshev’s inequality gives us some informa- 
tion about the probability function of X. For example, if we put 2 = 2 
in (3.17), then P(|X — ux| € 2ux) > $. In words, the event that a 
random variable assumes a value that is within two standard devia- 
tions of its mean has probability greater than 2. Put differently, à 
total probability of more than 3 is accounted for by values of X in 
the interval [ux — 2ex, ux + 2ex]. If z = 3, then we similarly con- 
clude that a total probability of more than $ is accounted for by 
values of X in the interval [ux — 30x, ux + 30x]. 

By using (3.18), these faets can equally well be expressed in terms of 
the standardized random variable X*. For example, 


P(—2 < X* <2) >}, P(-3€ X* €3) » $, ete. 


Either way, we see how the spread or dispersion of the values of the 
random variable X about the mean px is controlled by the standard 
deviation cx. 

"Theorem 3.5 is extraordinarily general; the probability statements 
given by Chebyshev's inequality apply to any random variable. One 
pays a price for such generality, since one cannot expect an inequality 
that applies to all random variables to be especially sharp and defini- 
tive when applied to some specifie random variable. (See Problem 
3.17.) Nevertheless, Chebyshev's theorem is an important analytic 
tool in the theory of probability. We shall have occasion to use it in 
a later section when we prove the so-called law of large numbers. 


PROBLEMS 
3.1. ^ questionnaire sent to four families yields the following information. 


"Own TV set?” Total Income Number of Children 
yes $10,000 2 
yes 5,000 3 
yes 8,000 0 
2 


vane 


no 5,000 
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3.2. 


3.3. 


34 


3.5. 


3.6 


3.7. 


3.8. 


One of these families is chosen at random. Let X have the value 1 if 
the family owns a television set and the value 0 otherwise, let Y be 
the income of the family, and let Z be the number of children in the 
family. Find the mean, variance, and standard deviation of each of 
these random variables. 

Suppose 70 percent of the voters favor a certain proposal, 30 percent 
being opposed. A voter is selected at random and we let X = 0 if he 
is opposed, X = 1 if he is in favor. Find E(X) and Var(X). 

Let X denote the sum of the numbers obtained when two fair dice are 
rolled. Find the variance and standard deviation of X. (Cf. Problem 


2.13.) 

(a) Consider the random variable X defined in Example 3.2. Show 

that Var(X) < }. For what value of p is Var(X) = à? (b) Generalize 

(a) by showing that if X is any random variable such that E(X*) = 

E(X), then Var(X) € &- 

(a) Let X, be the number of head 
k independent times. Fork = 


standard deviation of Xx. 
(b) Redo part (a), but now assume the coin is biased so that the prob- 


ability isp (0 < P € 1) that it falls heads on any toss. 

es were shown to have the same 
dom variable. Interpreting vari- 
easonable that Var(X:) 


s obtained when a fair coin is tossed 
1, 2, 3, 4, calculate the variance and 


In Problem 2.18, six random variabli 
mean. Find the variance of each ran 
ance as a measure of spread, does it seem ri 


should decrease as k increases? 


The mean and variance of X are 50 and 4, respectively. Evaluate (a) 
) the standard deviation 


the mean of X2, (b) the variance of 2X + 3, (c laro i 
of 2X + 3, (d) the variance of —X, (e) the standard deviation of —X. 


Prove the following result for any number a: 
E([X — a}) = Var(X) + (E(X) — ay. 

Use this formula to show that E(X — aj) is minimized when 

a = E(X); i.e., the mean of the squared deviations of the values of a 

random variable is as small as possible when the deviations are com- 

puted from the mean of the random variable. (Note. We remarked in 

Section 2 that if f(a) is thought of as the mass of a particle at the point 

2, on the z-axis, then E(X) as given by Formula (2.1) is the center of 
, ty. With this same 


gravity of the system of masses at zy Ta * 
interpretation, Var(X) as given by Formula (3.7) becomes what the 


physicist calls the moment of inertia of the mass system with respect 
to an axis through the center of gravity and perpendicular to the x-axis. 
The reader can verify that the equation established here is the mathe. 
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matical formulation of the following parallel axis theorem: The moment 
of inertia of a mass system about any given axis is the moment of 
inertia of the system about a parallel axis through the center of gravity, 
plus the moment of inertia about the given axis if all the mass were 
concentrated at the center of gravity.) 


3.9. The random variable X is given and we define a new random variable 

Y = g(X), as in Definition 2.2. If g(x) = a + bx + cx’, show that 
E(Y) = a + bE(X) + c[E(X)F + c Var(X). 

3.10. An urn contains six balls. Three have 1’s on them, one has a 2, and 

two have 3’s. One ball is drawn from the urn and then, without re- 

placing the first, another is drawn. Let X, be the number on the first 


ball and X» the number on the second ball. Find the standard devia- 
tions of X; and X». 


3.11. A subject is shown a deck of three cards numbered 1, 2, and 3. The 
cards are shuffled and placed face down on the table. The subject is 
asked to call the order of the cards. Let X denote the number of correct 
calls made by the subject. Consider the following possible ways that 
a subject might guess: 

(1) He chooses one card and calls it three straight times. 

(2) He makes three independent guesses. For example, he can roll a 
fair die and guess the first card is 1 if a 1 or a 2 comes up, guess 
the first card is 2 if a 3 or 4 comes up, and guess the first card is 
3 if a 5 or a 6 comes up. The die is then thrown twice more to 
determine the subject’s second and third calls. 

(3) He chooses at random one permutation from among all the permu- 
tations of the numbers 1, 2, 3 and calls his guesses in the order 
specified by the selected permutation. (Cf. Problem 11.3.9.) 

For each of these methods of guessing, find (a) the probability function 

of the random variable X, (b) the mean of X, and (c) the standard 

deviation of X. 


3.12. Let X denote the number obtained when one number is selected at 


random from the numbers 1, 2, 3, +++, N. Show that 


n(x) = Nl o.N-1 
E(X) = 2^" Var(X) = is 
3.13. Another measure of the spread* of the values of a random variable is 
the mean absolute deviation defined as the number 
N 


2, lt wx fle). 


For an interesting discussion of measures of variability and their use to 


Measure risk in a portfolio of securities, see H. Markowitz, Portfolio Selection 
John Wiley and Sons, Inc., 1959, pp. 286-297. 
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Compute the mean absolute deviation of the random variables whose 
probability functions are given in (3.1)-(3.4) and compare with their 
standard deviations. 

3.14. Prove Theorem 3.4. 

3.15. A random variable X has mean 100 and standard deviation 10. X* is 
the standardized random variable corresponding to X. (a) What value 
of X* corresponds to each of the following values of X: 85, 100, 103? 
(b) What value of X corresponds to each of the following values of X*: 
—2, —1, —0.4, 1.3? 

3.16. In the proof of Theorem 3.5, where was the hypothesis ox > 0 used? 
Is (3.15) true if ex = 0? 


3.17. For each of the following random variables, calculate 
P(X — uxl € zox) 
for z = 1.5 and z = 2, and compare these probabilities with the cor- 
responding estimates given by Chebyshev's inequality. 
(a) X, the number of points obtained in a roll of a fair die. 
(b) X, the sum of the number of points on two fair dice. 
(c) X, the number of heads obtained when four fair coins are tossed. 


3.18. You are told that no possible value of a random variable X is more 
than one standard deviation from the mean; i.e., all possible values 
are in the interval [ux — ox, ux + cx]. Show that X either has only 
one possible value, this value therefore occurring with probability 1, 
or X has two possible values, each occurring with probability $- 


3.19. Let X be any random variable and consider the statement 
P(-z < X* < z) > P- 
For each of the following values of p find the smallest value of z (accord- 
ing to Chebyshev’s inequality) that makes the statement true: p = 0.5, 


p = 0.9, p = 0.95, p = 0.99. 


3.20. The random variable X is given and the new random variable Y = g(X) 


is defined. Suppose that the possible values of Y are all nonnegative 
and that not all are zero. For any positive number, say c?, prove that 


o ED. 
[Note. This formula generalizes Chebyshev’s theorem, for we obtain 
(3.15) as a special case if we put g(x) = (œ — n] 


4. Joint probability functions; independent random variables 


ed, we are often interested in more 


When an experiment is perform! 
If 13 cards are 


than one characteristic of the resulting outcome. 
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dealt from a full deck, we might be interested in the number of spades 
in the hand and the number of aces; if a person is selected from a cer- 
tain population, we might want to record his height and weight, his 
IQ test score and the average number of hours he watches television, 
etc. In such cases, we are interested not only in studying each charac- 
teristic separately, but also in determining interrelationships that 
exist among the characteristics. 

In mathematical terms, we are given a sample space S and n ran- 
dom variables defined on S, where n is an integer greater than or equal 
to 2. In this section, we study the bivariate case (n = 2), concluding 
with some remarks on the more general multivariate case (n > 2). We 
begin with an example that serves to prepare the way for the formal 
development that follows. 


Example 4.1. A fair eoin is tossed three independent times. We 
choose the familiar set 


S = (HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} 


as sample space and assign probability i to each simple event. We 
define the following random variables: 


ut. 0 if the first toss is a tail, 

l ifthe first toss is a head, 
Y — the total number of heads, 
Z 


— the absolute value of the difference between the number 
of heads and tails. 


(Note that when we define random variables in this way, the equality 
sign is used as shorthand for “is the random variable whose value for 
any outcome (element of S) is". The distinction between the random 
variable and the value of the random variable should be kept clearly 
in mind even when, as here, the eustomary notation is somewhat 
misleading.) 

We list in Table 26 the values of these three random variables for 
each element of the sample space S. Consider first the pair X, Y. 
We want to determine not only the possible pairs of values of X and 
Y, but also the probability with which each such pair occurs. To say; 
for example, that X has the value 0 and Y the value 1 is to say that 


the event (THT, TTH) occurs. The probability of this event is 
therefore 2 or 1. We write 


P(X =0,Y =1) =}, 
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TABLE 26 


Element of S Value of X Value of Y Value of Z 


HHH 1 
HHT 1 
HTH 1 
THH 0 
HTT 1 
THT 0 
TTH 0 
TET 0 


O — m m t oM 0 
Q9 2 m m mnn 


which a comma is used in place of 


adopting the usual convention in 
two events X = 0 and Y = 1. We 


f^ to denote the intersection of the 

similarly find 

P(X-0,Y-0 = P({TTT}) = $ 
P(X =1,Y =0)= P(f)- 0, ete. 

babilities of all possible pairs of values 


In this way we obtain the pro 
arranged in Table 


of X and Y. These probabilities are conveniently 
27, the so-called joint probability table of X and Y. 


TABLE 27 


1/8 1/4 1/8 


0 1/8 1/4 1/8 


P(Y =y) 1/8 3/8 


se results graphically as in Figure 23. 
In Figure 23(a) a heavy dot is located at each point (z, y) for which 
and this probability appears next to 


P(X = a, Y = y) is positive, i next, 
the dot. In Figure 23(b) we draw a three dimensional chart in which 
P(X = z, Y = y) is the height of a vertical line drawn above the 


Point (x, y) in the horizontal x-y plane. 


We can also represent the 
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P(X-zY-y) 


(3) 


Figure 23 


The event Y — 0 is the union of the mutually exclusive events 
(X — 0, Y = 0) and (X = 1, ¥Y = 0). Hence 
PY = 0) = P(X = 0, ¥ = 0) + P(X =1,¥ =0) 
=}+0=} 
In Table 27, this probability is obtained as the sum of the entries in 


the column headed y = 0. By adding the entries in the other col- 
umns, we similarly find 


PY=0)=8} Pr a3) ag P(Y =3) =}. 


marginal probability func- 


e of Y occurs. For example, 
he value of X is 1, then the 
= 2 becomes 4. For, by the 


PY = 2) = š- But if we are told that t| 
conditional probability of the event Y 
definition of conditional probability, 


Mdb da E 


le peser 
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As we expect, the events X = 1 and Y = 2 are not independent: 
knowing that the first toss results in a head increases the probability 
of obtaining exactly two heads in the three tosses. 

What we have done for the pair X, Y can also be done for X, Z. 
We give the results only, asking the reader to check our calculations. 
The joint probability table of X and Z is given in Table 28. We 
have, as before, written in the margins the row-sums and the column- 


TABLE 928 
z 
z 1 3 
0 3/8 1/8 
$ 3/8 1/8 


P(Z = 2) 3/4 1/4 


sums which determine the (marginal) probability functions of x and 
7, respectively. In Figure 24, we graph these results as we did for 
X, Y in Figure 23. 

P(X=x, Z=2) 


(9) 


Figure 24 


Finally, let us observe that the events X -0and Z = 1 are in- 
dependent, since we find P(X = 0, Z= 1) = 3, and sinea this is 
equal to the product of P(X = 0) = 3 and P(Z = 1) = 4. This is 
reflected in Table 28 by the fact that the probability appearing in 
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the cell determined by the row labeled z = 0 and the column labeled 
z = 1 is the product of the marginal totals for that row and column. 
Indeed, this multiplication property holds for each of the four entries 
in the joint probability table of X and Z. A comparison with Table 
27 will show that the entries in the cells of the joint probability table 
of X and Y are not products of the corresponding marginal prob- 
abilities. Thus the random variables X and Y have a relationship 
to each other that is different from that shown by X and Z. According 
to the definitions to be given below, we say that the random variables 
X and Y are dependent, but that X and Z are independent. 


With this particular example understood, we can now proceed to 


discuss the general case of any two random variables defined on the 
same sample space. 


Definition 4.1. Let a sample space S = (oi, o», -+ +, on} be given to- 
gether with an acceptable assignment of probabilities to its simple 
events. Let X and Y be random variables defined on S. Then the 
function h whose value at the point (z, y) is given by 


(41) h(w,y) = P(X =2,Y = y) 
= P({oreS | X(o) = x and Y(o) = y}) 


is called the joint probability function of the random variables X and 
Y. (The domain of the function h is the set of all ordered pairs of 


real numbers, although h has nonzero values for only a finite number 
of such pairs.) 


Let us suppose that X has possible values 23, 2, 
ability function f. Then 


(4.2) 


+++, £y and prob- 


fe) =PX=2)>0, Ff) =1. 
j=1 


Similarly, if Y has possible values y;, yo, +- *, Yy and probability func- 
tion g, then 


(4.3) gw) =P =) >0,  Xg;y)-1 
k=1 


With this notation, the joint probability table of X and Y is defined 


as the double-entry array in Table 29. The probabilities listed in 


this table have the following properties: 


(4.4) (23, yx) > 0 forj = 1,2, ---,M;h = 1,2, +, N. 
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(4.5) E h(x, yx) = 1 
all j,k 


N 

(4.6) Z, h(zj, yx) = f(x) forj21,2,-::, M. 
M 

(4.7) 2 h(x; y) —9(y) fork =1,2,---,N. 
Ja 


The inequality in (4.4) expresses the obvious fact that the prob- 
ability of the joint occurrence of the events X = zj and Y = y, is 
nonnegative. We note, however, that although we have assumed in 
(4.2) and (4.3) that the events X = z;and Y = y; occur with positive 
probability, we must allow in (4.4) for the possibility that the inter- 
section of these events is the empty set, and thus has probability zero. 


TABLE 29 


y 
E yı ys m yr eae YN P(X = 2) 


T Aay) hasy) c hany) c Gus) f(a) 
n h(rsy) — hGsy) hla) c Menyn) f) 


vi Kany) hany) c Meim) ce h(aiyyw) f(z) 


Tu (xp) hay) ccc Mam) c7 h(zwsyN) 


g (yx) g(yx) 


gn) gu) 


of all MN probabilities 
he entries listed in the 
any number of ways, 
thy. We can sum each row 


_ In (4.5), we merely observe that the sum 
in the joint probability table (not including t 
margins) is 1. This sum can be calculated in 


but two methods are especially notewor 
first, then add the row-sums; i.e., 


M N M 
(4.8) Z heny) = X È eny) = 2 fe) = 1; 
all j,k j=1 k=1 j=1 


or we can sum each column first, then add the column-sums; i.e., 
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N M h x ( ) 1 
j = hy) = yr) = 1. 
(4.9) i h(i, yx) PA E h(E; yx) P g(Yr 


In the first method, the row-sums are the probabilities with which 
the possible values of X occur. This fact is recorded in (4.6) and 
follows from the observation that the event X = z; occurs whenever 
one of the joint events (X = 2;, Y = y) occurs for some value y of 
the random variable Y: For different values of y, these joint events 
are clearly mutually exclusive, and so 


P(X = 2) = Z P(X = x; Y = y). 
y 


This equality is equivalent to (4.6), since only possible values of Y 
can contribute to the sum. We similarly can show that the sum of 
the entries in any column of the joint table is the probability with 
which the value of Y determining that column occurs. This fact is 
recorded in (4.7). 

Thus we see that from the joint probability table we can recover 
the probability functions of the random variables X and Y by adding 
rows and columns. Since the resulting probabilities f(z;) for j= 
1,2, +++, M and g(yx) for k = 1, 2, ---, N are written in the margins 
of the table, these probabilities are known as marginal probabilities, 
and f and g are referred to as the marginal probability functions of X 
and Y, respectively. In both cases we note that the adjective "mar- 
ginal” is technically redundant. 

Let us now turn to the task of defining the important concept of 
independent random variables, to which we alluded at the end of 
our discussion in Example 4.1. (We know the meaning of independent 
events and independent trials from Chapter 2.) The following defini- 
tion seems reasonable, in view of our observations in Example 4.1. 


Definition 4.2. Two random variables X and Y defined on the 
same sample space 5 are said to be independent if and only if 


(4.10) P(X =£, Y =y,) = P(X = z)P(Y = y) 
for j = 1,2, ---, M and k = 1, 2, -++, N. In other words, “X and 


Y are independent random variables” means that the events X = Ti 


and Y — y, are independent events for all pairs of possible values cj 
and y,. Random variables that are not independent are said to be 
dependent. 


Equivalently, we see that 


1 X and Y are independent random vari- 
ables if and only if 
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ea h(z;, ye) = F(zo(ye), 
ie., if and only if the joint probability table assumes the form ofa 
multiplication table in which A(z;, yx), the entry in any row and col- 
umn, is the product of f(z;), the probability in the row margin, and 
(Ye), the probability in the column margin. 

With this definition before us, a quick glance at Tables 27 and 28 
shows that in Example 4.1, as we anticipated, X and Y are dependent 
random variables, but X and Z are independent. 


Example 4.2. An urn contains three red and two green balls. A 
random sample of two balls is drawn (a) with replacement, and (b) 
without replacement. In either case, we define 


v= { O if the first ball is green 
1 if the first ball is red, 


Y us { 0 if the second ball is green 
1 if the second ball is red. 


We find the two joint probability tables given in Table 30. Note that, 


TABLE 30 


(a) with replacement (b) without replacement 


in (a), X and Y are identically distributed and are independent ran- 
dom variables. In (b), X and Y are also identically distributed, but 
now they are dependent random variables. Although it is always 
Possible to derive the probability functions of X and Y from their 
Joint probability function, as this example demonstrates it is generally 
Impossible to reconstruct the joint probability table if only the mar- 


ginal probabilities of X and Y are known. 
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Further insight into the reasonableness of our definition of inde- 
pendence of random variables is obtained by looking at conditional 
probabilities. Suppose we are interested in the event that X has the 
value z; given that Y has the value yz. We find directly from the 
definition of conditional probability that 


(412 PA =Y = y) = PC mm Em _ Ms ua. 


P(Y = y) gus) 
(Recall that in (4.3) we have assumed g(y;) # 0.) If we write 
(4.13) Faily) = P(X = xY = y), 


then for fired k, we have a function defined with domain the set of 
possible values of the random variable X. To distinguish clearly the 
function from its value, we shall write f(- | y;) for the function and 
f(z;|yx) for the value of the function at z = z; We have N such 
functions, one for each possible value y; of Y. In terms of a function- 
machine, the inputs of the f(- | y;)-machine are the possible values 
of the random variable X. If z; is the input, then the corresponding 
output number is the conditional probability f(x; | yy) given in (4.13). 

We now show that each of the functions fC ly) isa probability 
function. By Theorem 1.3, it suffices to show that the values of the 
function are nonnegative and that these values add to 1. Since 


f; | yx) is defined in (4.13) as a probability, it is clearly nonnega- 
tive. Furthermore, 


M M 
41) S felu =- oan L 
PAED ays) jo P n») = ou) 7 | 


Hence f(- | y;) is a probability function. It is important enough to 
deserve a special name. 


Definition 4.3. Let y, be any possible value of Y. The function 
SC | ye) whose domain is the set of possible values of X and whose 
value f(x; | yx) is given by (4.13), or by (4.12), is called the conditional 
probability function of X, given Y = y. The conditional probability 


function of Y, given X = z; is similarly defined as the function g(- | zi) 
Whose value at y; is given by 


(4.15) olu | #3) = P(Y = w|X = 2) = “a 


Example 4.3. Refer to Example 4.2 and suppose that Y has the 
value 1, i.e., that the second ball drawn is red. We want to determine 
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the conditional probability function of X when the balls are drawn 
(a) with replacement, and (b) without replacement. We must there- 
fore calculate f(0 | 1) and f(1 | 1) in both cases. The probabilities we 
need in order to use Formula (4.12) can be read directly from Table 
30, and we obtain the following results. 

(a) With replacement: 


fol o =F 
jun = MD = =e 
(b) Without replacement: : 
i010 = HO - 3-4 
quince ot 


Observe that in case (a), where X and Y are independent, the con- 
ditional probability function of X given Y — 1 has the same values 
as the probability function of X ; i.e., f(0 |1) =f) and f(1 | 1) = f(). 
But in case (b), where X and Y are dependent, knowing that the 
second ball drawn is red changes the probabilities of drawing a red 
or green ball on the first draw. For example, f(1) = $ is the prob- 
ability that the first ball is red in the absence of any information. 
When we are told that the second ball drawn is red, the conditional 
probability that the first ball is red decreases tof(1|] D = & 


tes the definition of independence of 


The f i formula € 
puni pape al probability function. 


random variables in terms of the condition: 
We leave the proof for the problems. 
Theorem 4.1. The random variables X and Y are independent ^ 
and only if, for every possible value y; of Y, the conditional prob- 
ability function of X given Y = y. and the (marginal) probability 
function of X have equal values for each possible value of X; i.e., 
if and only if 
616) sæl =fe)  fej-b?r 
This result shows that X and Y are independent whenever knowing 
e probability with which X has any 


the value of Y does not change th ! 
of its values, or equivalently (see Problem 4.14), whenever knowing 


eQM;bm 12, N. 
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the value of X does not change the probability with which Y has any 
of its values. 

We shall return to conditional probability functions in a later sec- 
tion, but now we take up two results that are strongly suggested by 
our intuition. The first of these asserts that if X and Y are independ- 
dent, then so are u(X) and v(Y) for any functions u and v. For ex- 
ample, if X and Y are independent, then with u(t) = x — ux and 
v(y) = y — py this theorem will permit us to conclude that X — ux 
and Y — yy are independent; with u(x) = z? and v(y) = y’, that X° 
and Y? are independent; etc. 

Theorem 4.9. Let X and Y be independent random variables. Let 
u and v be functions for which u(X) and v(Y) are defined in the sense 
of Definition 2.2. Then u(X) and v(Y) are also independent random 
variables. 


Proof. According to Definition 4.2, to prove u(X) and v(Y) are in- 
dependent it suffices to prove that 


P(X) = z, (Y) = y) = P(u(X) = z)PQ(Y) = y) 
for every pair of numbers z and y. Now 


PQ(X) = z, (Y) = y) = 2* P(X = 2, Y = y), 


where the asterisk indicates that the sum is to be taken over only 
those values of j and k for which u(rj = x and v(y;) = y. Since x 
and Y are independent by hypothesis, we can apply (4.10) to obtain 


PUE) = 2, (Y) = y) = B* P(X = z)P(Y = y) 
= 2* P(X = 2) 3* PY = y) 


= P(u(X) = 2)PQ(Y) = y), 
and the proof is complete. 
Our next result concerns two random variables X and Y such that 
the value of X is determined by the first trial and the value of Y is 
determined by the second trial of a two-trial experiment. If the trials 
are independent (as defined in Section II.9), then we would be dis- 
satisfied with our theory if it did not enable us to prove that X and Y 
are independent random variables. For example, let X and Y denote 
respectively the sum obtained in the first and second rolls of a pair 
of dice. If the two rolls (trials) are independent, then we expect that 
X and Y are independent random variables. We leave for the reader 
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the task of showing that the following result is an immediate conse- 
quence of Theorem II.9.1. 

Theorem 4.3. Let an experiment consist of two independent trials. 
If the value of a random variable X is determined by the first trial 
and the value of a random variable Y is determined by the second 
trial, then X and Y are independent. 

We conclude this section by recording for later use the extension 
of some of our results to the case where more than two random vari- 
ables are defined on the same sample space. First we make the natural 
extension of Definition 4.2. 

Definition 4.4. Let n be any positive integer greater than 1. The 
random variables Vi, Vs, +++, Vn defined on a sample space S are 
said to be independent if and only if 
(4.17) — P(Vi = n, Va = va o, Vn = Va) 

i = P(Vy = w)P(Vs = v) +++ P(Vn = va) 
for all combinations of possible values vı of Vi, v» of Va, +++; Un a Va 
In other words, “Vi, Vs, +++, Vn are independent random variables 
means that V; = v, Vs = 02°11) Vn = Un are independent events 
(in the sense of Definition II.8.3) for all possible values vi, vs * **; Va: 

Corresponding to Theorems 4.2 and 4.3 we have the following re- 
sults whose proofs we leave for the reader. 

Theorem 4.4. Let Vi, Vs, «++, Vn be independent random variables. 
Let w, us, +++, un be functions for which tu (V3), ux(V2), *: :, Un(Vn) 
are defined in the sense of Definition 2.2. Then wu(V1), w2(V2), ** `s 
un(V,) are also independent random variables. 

Theorem 4.5. Let an experiment consist of n independent trials. 
If the value of random variable V; is determined by the jth trial for 
j= 1,2, ..., n, then the random variables Vi, Vs, +++, Vs are in- 
dependent. 

We continue our study of joint probability 
Section, 


functions in the next 


PROBLEMS 


variables defined in Example 4.1. Con- 
Z, sketch the corresponding 
etermine whether or not Y 


41. Let Y and Z be the random 
struct the joint probability table of Y and 
three-dimensional probability chart, and d 
and Z are independent. 
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4.6. 


4. 


4.8. 


RANDOM VARIABLES / Chap. 4 


Modify Example 4.1 by redefining Z as the algebraic difference of the 
number of tails from the number of heads. Construct the joint prob- 
ability table of X, Z and sketch the corresponding three-dimensional 
probability chart. Are X and Z independent? 


Suppose three indistinguishable objects are distributed at random into 
three numbered cells. Let X be the number of empty cells and Y the 
number of objects in the first cell. Construct the joint probability table 
of X and Y. Are X and Y independent? 


Let X be the larger of the two numbers and Y be the sum of the 


numbers showing when two fair dice are rolled. Construct the joint 


probability table of X and Y. Are X and Y independent random 
variables? 


Let X denote the number of spades and Y the number of hearts in a 
bridge hand. Write a formula for 

A(z, y) = P(X = 2, Y = y) 
and prove that X and Y are dependent. 
Three cards are drawn from the 12 face cards of an ordinary deck. Let 
X be the number of red jacks and Y be the number of red queens. 
Construct the joint probability table of X and Y, sketch the correspond- 
ing three-dimensional probability chart, and show that X and Y are 
identically distributed but dependent random variables. 


A fair coin is tossed four independent times. Let X be the number of 
heads obtained in the first two tosses and Y the number of heads ob- 
tained in the last two tosses, Construct the joint probability table of X 
and Y and sketch the corresponding three-dimensional probability 


chart. Show that X and Y are independent by using Definition 4.2 and 
also by invoking Theorem 4.3. 


The joint probability function of 


X and Y is given by 
Az, y) = Jala? + y?) 


forz = 0, 1, 2, 3 and y = 0, 1. 
(a) Show that the marginal probability function of X 
JG) 4.95 -F1)  forz- 0, 1, 2, 3. 
(b) Show that the marginal probability function of Y is given by 
99) = J&Qy F7)  fory = 0, 1. 
(c) Show that the conditional 
is given by 


T 
fel) -35. 5 for æ = 0, 1, 2, 3 and y = 0, 1. 


is given b; 


probability function of X given Y = Y 


4.9. 


; 4.10. 
f 411. 


4.12. 
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(d) Show that the conditional probability function of Y given X = z 
is given by 
oy 
sul) = 331 for x = 0, 1, 2, 3 and y = 0, 1. 
We select one of the integers 1, 2, 3, 4, 5. After discarding all integers 
(if any) less than the selected integer, we draw one of the remaining 


integers. (For example, if we select 3 first, then the second draw is 
made from the integers 3, 4, 5.) Let X and Y denote the numbers 


obtained on the first and second draws, respectively. 


(a) Construct the joint probability table of X and Y. 

(b) Determine the marginal probability functions of X and of Y. 

(c) Determine the conditional probability function of Y given X — 3. 
(d) Determine the conditional probability function of X given Y — 3. 
(e) Find P(X + Y > 7) and P(Y — X =O). 

has only one possible value, this value therefore occurring 


Suppose X 
X and Y are independent for any random 


with probability 1. Show that 
variable Y. 

X and Z defined in Example 4.1 of the text, 
determine the value of P(X < x, Z < 2) for all real numbers z and z. 
(Hint. Refer to Figure 24(a) and divide the z-z plane into a number 
of regions such that P(X € z, Z < z) has the same value at all points 
of any one region.) 


The joint distribution function of 
z and y by the equation 

H(z, y) = P(X <2, Y <v). 
(a) If h is the joint probability function of X and Y. show that 

H(ny- E. hw) 

zi Eran SY 

all j- and k-values for which 2; € T% 
logue of the formula in Prob- 


For the random variables 


X and Y is defined for all real numbers 


where the sum is taken over 
and y, € y. (Thisis the bivariate anal 
lem 1.10.) 

(b) Let a, b, c, d be any n 
(cf. Problem 1.11) 


Pa<X<bc<Y <d) 
= H(b, d) — H(a, d) — H(b, c) + H(a, c). 


H(z, y) isa nondecreasing function of y, and 
y) isa nondecreasing function of z. 

R exist so that H(z, y) = Oif x < r and 
lifz > Randy 2 R. 


umbers with a < b and c < d. Show that 


(c) Show that for fixed z, 
that for fixed y, H(z, 
(d) Show that numbers r and 
y <r, whereas H(z, y) = 
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(e) Above each point (x, y) in the z-y plane, imagine a point drawn at 
the height H(z, y). The set of all points drawn in this way is a 
surface which is the three-dimensional graph of the joint distribu- 
tion function H. Describe the kind of surface one obtains. 


4.18. Let X and Y be independent random variables. Choose any two rows E 
their joint probability table. Show that there is a number (which wil 
depend on the rows you choose) such that the probabilities in one row 


are obtained by multiplying the corresponding probabilities in the other 
row by this number. 


414. Let g(y.|z;) denote the conditional probability of Y — y, given 
X = 2; Show that if f(z; | yx) = f(x,) for all possible values x; of X 
and y; of Y, then for all these values also gx | z) = gly). 

4.15. (a) Prove Theorem 4.1. 

(b) Prove Theorem 4.3. 
(c) Show that the converse of Theorem 4.2 is false by giving an example 


of two dependent random variables X and Y for which X? and Y? 
are independent. 


4.16. (a) Show from Definition 4.4 that if Vi, Vs, +++, Vn are independent, 
then any smaller number of random variables taken from these ” 
are also independent. " 

(b) Let an experiment consist of n independent trials. For any positive 
integer k < n, we can think of this experiment as made up of two 
supertrials, the first k trials being the first supertrial and the last 
n — k trials being the second supertrial. Show that these super- 
trials are independent, and hence conclude from Theorem 4.3 that 
if X is a random variable determined by the first k trials and Y is 


a random variable determined by the last n — k trials, then X and 
Y are independent, 


5. Mean and variance of sums of random variables; the sample mean 


We shall see in this section th 
are defined on a sam 
other random varia. 
X + Y and the pr 


at if two random variables X and Y 
ple space S, then there are automatically many 
bles also defined on S. In particular, the sum 
oduct XY turn out to be especially important. 
We also extend our results to the case where more than two random 
variables are defined on S, and are then able to prove some theorems 
of great interest in the branch of statisties known as sampling theory. 

Our first task is to develop the bivariate analogue of Theorem 2.1 
as an aid to computing means of random variables that are functions 
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of X and Y. Suppose z is a numerical-valued function whose domain 
is a set of ordered pairs of real numbers and let z(z, y) denote the 
value of z at the ordered pair or point (z, y). If the domain of z in- 
cludes all the ordered pairs of values of X and Y, then for each ele- 
ment 0; e S we can first find the corresponding values X (0;) and Y(o3, 
and then evaluate z at the point (X (0, Y (o2). See Figure 25. In 


Xlo) 
o;eS Je Y(o)) 
Y(oj) 


Figure 26 


z(X(o), Y(o)) 


this way, to o; eS (the input) we make correspond the (output) real 
number z(X(oj), Y (o2), and thus we have a new random variable de- 
fined on S. This random variable is denoted by z(X, Y). For ex- 
ample, if z(z, y) = z + y, then 
o(X,Y)=X+Y¥ 
is the sum of the random variables X and Y; if 
z(x,y) = (x — ux) — BY) 

then 

«x,Y)-QG- px)(Y — ur) 
is the product of the deviations of X and Y from their respective 
means; ete. In the example that follows, we illustrate how to deter- 
mine the probability function of the random variable z(X, Y) from 
the joint probability table of X and Y. The mean of z(X, Y) can 
then easily be computed. 


Example 5.1. Consider the random variables X and Y of Example 


4.1. The possible values of X and Y, together with their joint prob- 


abilities, are given in Table 27. Let z(z y) - tt y 80 that 


U = z(X, Y) = X + Y. From the joint probability table, we can 
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determine the possible values of U as well as the probability with 
which each value occurs. For example, 


P(U = 2) = P(X =0,¥ =) + P(X =1,Y =1) =44+4=7. 


In this way, we obtain the entries in the following probability table 
for the random variable U = X + Y: 


u 0 1 2 3 


P(U = u) 1/8 1/4 1/4 1/4 


From this table, we calculate the mean of U: 

E(U) = E(X + Y) = 00) +10) + 20) +36) +40) = 2. 
From the marginal probability functions of X and Y, also given in 
Table 27, we find that 
EX = 0G) +1@)=3, EQ) = 08) 10) 4-230) - 
Observe that E(X + Y) = E(X) + E(Y), a result that we will soon 
establish for all random variables X and Y. 

If we define z(z, y) as the product rather than the sum of x and y; 


then V = 2(X, Y) = XY is a random variable whose probability 
table is similarly found: 


v 0 1 2 3 


PV =») 1/2 | 1/8 | 1/4 | 1/8 


Now we compute the mean of V, 


EV) = E(XY) = 00) + 10) + 2) +3@) =1. 
Observe that E(XY) + E(X)E(Y). 


_The reader should note that what we do to determine the proba- 
bility function of z(X, Y) is collect all possible pairs of X and Y 


values that lead to the same value of z(X, Y). But it is more con- 


venient not to do this when we want to compute the mean of z(X, Y)- 
The following result tel 


ls us how to compute E[z(X, Y)] direct ly 
from the joint probability table of X and Y without first determining 
the probability function 


of z(X, Y). The proof is similar to that of 
Theorem 2.1, and we leave it for the problems. 
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Theorem 5.1. Let X and Y be random variables with joint proba- 
bility function h. Then 


(5.0) Ele(X, Y)] = a, a(x, ye) hos, Ye). 


In words, we find E[z(X, Y)] by moving from cell to cell in the 
joint probability table of X and Y, multiplying the value of z(X, Y) 
corresponding to each cell by the probability appearing in that cell, 
and then adding these products for all cells. 


Example 5.2. Refer to Example 5.1 and let us illustrate the use of 
Formula (5.1) by recaleulating the means of X + Y and XY. We 
find directly from Table 27, moving across the first row and then the 
second, 


EX + Y) = 0) + 100 + 2) +30 + 10) +26) 
4-30) 4-40) 


— 2, as before. 

write down terms that have zero factors. 
Indeed, any cell in the joint probability table for which either 
2(v;, yx) = O or h(t; yx) = 0 can be skipped in computing E(«(X, Y)]. 
For example, we note that XY has the value 0 for five of the eight 
cells in Table 27. Hence we skip these and find, as in Example 5.1, 


EY) = 10) + 2@) +3@) = 1. 


There is of course no need to 


Theorem 5.1 enables us to prove the following extremely important. 


and often-used results. 

Theorem 5.9. Let X and Y be any random variable: 
sample space S. Then 
(5.2) E(X + Y) = E(X) + E(Y). 
In words, the mean of the sum of two random variables is equal to the 
sum of their means. 

Proof. According to Formula (5.1) we have 

EX+Y)= 2, (xj + y) hs Vi) 
= 2, ahlt; Ye) + a, yh, Yr). 


allj, 


s defined on a 


we sum over rows and then add the 


In the first term on the right, 
we sum over columns and then add 


row-sums; in the second term, 
the column-sums. We recall (4.6) and (4.7) and find 
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M N N M 
E(X +Y) = I z; Z hany) t Eos X hy 
j=l k=1 k=l j= 
M N 
= 2 zif(z;) + EA yng (yx) 

f= a 

, E(X) + E(Y). 


Combining this result with Formula (2.7), we see that for any con- 
stants a and b, 


(5.3) E(aX + bY) = aE(X) + bE(Y). 


Still more general is the following theorem. 


ll 


" 
Theorem 5.3. Let n be any positive integer. If Xj, X», ae Xn 
are any random variables defined on a sample space S, and if ai, a5 
***, An are any constants, then* 


(54) BX, + aX» - -H a, X;) 
BO) aE (Ks) + +++ + as EGO. 


Proof. The result is true for n = 1 and n = 2 by Formula (5.3). 
The theorem is proved (by mathematical induction) as soon as we 
Show that if the theorem is true for any positive integer, say n = K, 
then it is also true for the next integer, n = k + 1. Let us therefore 
assume that (5.4) is true for n = k. That is, letting Y = aiX1 + 
tt + Xr, we are assuming 

E(Y) = aE(X) +--+ a, E(X;). 
The key idea of the proof is th 
random variables can be thoug 
ables to which (5.3) can be ap 


E(aiXi 4- .. 


€ observation that the sum of k + 1 
ht of as the sum of two random vari- 
plied. In partieular, 


tc aX, + Xr) = E(Y + Akt X iaa) 
= E(Y) + a 4E(X&43) 
= aE(XS) + +++ + B(X) 
JI aja E (Xii) 


But this last equality shows that (5.4) is true for n = k + 1, and so 
the proof is complete. 


* Strictly speaking, the sum OX, + aX,4.. 
(5.4) has been defined only if n = 1 orn = 2, 
that the sum for any positive inte 


each o; e S is the number aX 


* + aX, appearing in Formula 
We make the natural eS 
ger n is the random variable whose value a 


1(0:) + a2X3(0;) + +92 ag Xu(oi). 
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Example 5.3. We apply (5.4) to derive a useful identity: 

E[(X — ux)(Y — py)] = E(XY — uxY — urX + uxuv) 

= E(XY) — uxE(Y) — wyE(X) + expr. 

Except for sign the last three terms are equal. Hence 
(5.5) E(X — uxY(Y — u)] = E(XY) — exer. 

We turn now to some results leading to a formula for the variance 
of a sum of random variables. 

Theorem 5.4. Let X and Y be independent random variables de- 
fined on a sample space S. Then 
(5.6) E(XY) = EE). 
In words, the mean of the product of two independent random variables 
is equal to the product of their means. 

Proof. By Theorem 5.1 we write 

E(XY)- D yhen y) 
all j,k 


But the assumed independence of X and Y means, according to 
(4.11), that (æ; yx) = S(xig(ys) for all j and k. Hence 
E(XY) P aaf) U) 
all j,& 


Il 


M N 
> zf) 2 yg (Yu) 
j=1 k=1 


E(X)E(Y). 

hat the converse of Theorem 5.4 ds 
ows, it is possible for (5.6) to be 
hat are dependent. 


M 


It is very important to note t 
false, As the following example sh 
true for random variables X and Y t 


Example 5.4. Suppose X has probability table 


P(X = 2) 


are surely dependent, since the value of 


Let Y = X*. Then X and Y 
e is obvious from the 


X determines the value of Y. This dependence 
joint probabilities of X and Y, as given in Table 31. 
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TABLE 31 


1/4 
1/2 0 
0 1/4 


1/2 


1 
Nevertheless, the reader can quiekly check that E(X) = 0, E(Y) = d 
and E(XY) = E(X’) = 0, so that (5.6) holds. Recalling that (5.6) 
did not hold for the dependent random variables in Example 5.1, hi 
conclude that (5.6) holds for all pairs of independent random variable 
and some but not all pairs of dependent random variables. 


It is convenient to record here the following corollary: 
(5.7) ERX — ux)(Y — By) 0 — if X,Y are independent. 


This follows immediately from the identity in (5.5) if we apply 
"Theorem 5.4. 


We are now able to state a rule for finding the variance of the sum 
of two independent random variables. 


Theorem 5.5. Let X and Y be independent random variables de- 
fined on a sample space S. Then 


(5.8) Var(X + Y) = Var(X) + Var(Y). 


In words, the variance 


of the sum of two independent random variables 
is equal to the sum oft 


heir variances. 


Proof. By definition of variance, we have 


Var(X + Y) = E(x + Y) — E(X + Y)}) 
= EU(X — nx) + (Y — n), 
arranged terms in the bracket after using (5-2)- 
Now we perform the indicated squaring operation to obtain 
Var(X + Y) = BUX — wx)? + XX — p(y — wy) + (Y — pr)” 
(5.9) = EX — uxy'] --2E(X — ux)(Y — ay)] r 
i + E[(Y — i»); 


Where we have re 
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this last equality resulting from the use of (5.4). The middle term 
on the right vanishes according to (5.7). The other two terms on the 
right are, by Definition 3.1, precisely Var(X) and Var(Y). We have 
therefore completed the proof. 

Now if X and Y are independent, then so are aX and bY for any 
constants a and b. (This obvious fact is technically a consequence of 
Theorem 4.2.) Hence we can apply Theorem 5.5 to aX and bY, and 
so find that 

Var(aX + bY) = Var(aX) + Var(bY). 
Now we use (3.11) to conclude that for any numbers a and 6, if X 
and Y are independent, then 
(5.10) Var(aX + bY) = & Var(X) + b? Var(Y). 

Still more general is the following result, whose proof we leave for 
the problems. 

Theorem 5.6. Let » be any positive integer and suppose Xi, Xs, 

**, X, are independent random variables defined on a sample space 
S. Then for any constants ai, às, ***; dn We have 


(511) — Var(a.Xi + aX: + +++ + X9) = 2 a Var(X). 
i= 

In particular (if a1 = a» = tt = an = 1), the variance of the sum 

of any finite number of independent random variables is equal to the 

sum of their variances. Note that the corresponding result for the 

mean holds for any random variables, independent or dependent. 


Example 5.5. A deck of cards numbered 1, 2, +++, n is shuffled and 
placed face down on the table. As each card is turned, a subject tries 
to guess what number it will be. Suppose the subject does not re- 
member from one card to the next and calls his guesses independently 


and at random; i.e., he has the same probability 1/n for a correct 
guess at each trial (guess) of this n-trial experiment, and the trials 
his would be for the subject to 


f his guesses by selecting one 


from the deck of n cards. The corresponding problem in which his 


guesses are determined by drawing a random sample without replace- 


ment is discussed in Problem 6.8 of the next section.) 
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Let X denote the random variable whose value for any outcome 
of the n-trial experiment is the number of correct guesses made by 
the subject. We shall find the mean and variance of X by expressing 


X as a sum of n random variables, and then using Formulas (5.4) 
and (5.11). 


For k = 1,2, ---, n, let 
1 
0 if the kth guess is wrong (probability ie x 
(5.12) xy = ‘ 
1 if the kth guess is correct (probability J 


Then X = Xi c X:+ -+ Xn, since the value of the sum is equal 
to the number of 1’s in the sum, and hence is precisely the number of 


Correct guesses made by the subject, or the value of X. Now for 
k = 1,2, +++, n, we find 


(5.13) E(X) = o(1 = 5) +1 (3) -L 
Hence, by (5.4), 
E(X) = E(X + X+ --- 4 X;) 


ll 


E(X,) + E(X9) + «EQUO 


-—L-Ber- 
-]. 


n number of correct guesses is 1, and there- 
fore does not depend upon n, the number of cards in the deck. 

To compute Var(X), we note that X, is determined by the kth 
trial of the experiment. Since the trials are independent, we con- 
clude by Theorem 4.5 that X. 1, Xs, +++, X, are independent random 
variables. Hence (5.11) is applicable. Since for k = 1, 2, «++, 7; 


Var(Xs) = E(X) — rac) 
(5.14) ;- (Ij -2: 


. E NEC n 
we obtain 


Thus we see that the mea: 


, 


Var(X) = Var(X, + Xs + <- 4 X) 
= Var(Xi) + Var(X,) + --- + Var(X;) 


eub oy E 
n? n 
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In Chapter 5, we shall determine the probability function of X, but 
note that our method allows the calculation of the mean and variance 
of X without knowing this probability function. (Cf. Problem 3.11, 
Part 2, which is the special case of this problem when there are only 
three cards in the deck.) 


We turn now to an application of our theorems to experiments 
made up of a number of independent repetitions of the same trial. 
Such experiments and random variables associated with them can be 
interpreted in a number of ways and, in particular, supply a mathe- 
matical model for repeated measurement in the sciences and for 
sampling with replacement in statistics. 

Suppose a bowl contains N chips, each chip having a number on 
it. Some chips may have the same number on them, and we let 
Xi, Xo, +++, aw (M < N) be all the different numbers in the bowl. 
Suppose the number z; occurs f; times, So that the relative frequency 
with which this number appears in the bowl or the proportion of 
all chips having this number is f;/N. It follows that 
(6 M M fi zu 

.18) Ab =N or AN =1. 
1f one chip is selected at random from the bowl and we denote by X 
the random variable whose value is the number on this chip, then the 


Probability function of X is given by the following table: 


unction whose value for any 2; 


Thus i bability f 
we have a special proba is ps in the bowl: 


is the proportion of chips with x; on them among all chi 


(5160 — f(@;) = P(X = 4) = b forj = 1,2, M. 
ean and variance, when 


"The rend ifv that our definitions of m 
er can verify X, yield the formulas 


applied to this particular random variable 


" M 
(5.17) px = E(X) = 5 PED 
ji 


M 
(5.18) cd = Var(X) = FA (a; — ux) fi 
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1x " 
(5.19) og = Var(X) = x 2 wif; — X, 
j=1 


the last equality arising by use of (3.10). . 

It is customary in this context to say that we have a population 
of N chips and then to call ux and o% the population mean and the 
population variance of X. As we know (Theorem 1.3), we can con- 
sider X defined on the sample space S = (zi 45, +++, ta} whose 
simple events are assigned probabilities as given by the probability 
function of X, i.e., P({x;}) = f//N for j = 1, 2, ---, M. : 

From the population of N chips, we now draw n chips, replacing 
each before the next draw. We consider this as an experiment made 
up of n independent trials, each trial being defined by the sample 
space S and the trials being independent by virtue of our assumption 
that we are sampling with replacement. Our mathematical counter- 
part for this n-trial sampling experiment is the sample space given by 
the Cartesian product set S X S X --- X S (n S's), together with 
an assignment of probabilities in accordance with the product rule 
discussed in Section II.9. : 

For k = 1,2, ---, n, let X; be the random variable whose value is 
the number on the kth chip drawn from the bowl. Thus we have % 
random variables X;, X», +-+, X, defined on the Cartesian product 
sample space. Since each trial is an exact duplicate of any other, 
these random variables all have the same probability function. 
Furthermore, since X; is determined by the kth trial and the trials 
are independent, it follows that Xy Xz, ---, Xn are independent 
random variables. We summarize by saying that the random vart- 
ables Xy, X», +++, X, are independent and identically distributed, each 
with mean ux and variance o%. 

It is thus clear that our sampling experiment is completely deter- 
mined as soon as we know the common probability function of the 
X;’s. For then we are given the possible sample values that can arise 
in each trial, together with their probabilities. In other words, it 1$ 
meaningful to talk of a population specified by the probability func- 
tion of a random variable X. Indeed, for this reason sampling with 
replacement is often referred to as sampling from a probability 


function. 
The random variable X given by 
(5.20) yaa tet: +X, 


n 
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is called the sample mean of X. The value of X for any selection of n 
chips is just the arithmetic mean or average of the numbers on the 
chips. An experimenter who has incomplete knowledge of the com- 
position of the bowl could nevertheless draw his random sample from 
the population and obtain a value of the sample mean X. For ex- 
ample, if we think of the chips as corresponding to N people in a 
given population and the number on each chip as the income of the 
corresponding person, then the value of X is just the average of the 
n incomes selected in the random sample. Or, if the numbers on the 
chips are N possible measurements of some quantity, say the length 
of a bar to the nearest thousandth of a centimeter or the time to the 
nearest tenth of a second that it takes a rat to complete a maze, then 
the value of X is just the average of n such measurements. Before 
studying the random variable X in general, we pause to present a 
particular example to help fix these ideas. 


Example 5.6. Suppose our population is specified by the proba- 


bility table 


The reader can easily verify that the population mean and variance 


are given b; 
d px ck = 1.21. 


TABLE 32 


Probability of Value of X 

Sample Drawing This Sample For This Sample 
—1, —1 .01 —1 
=1,0 .05 -i 
—1,2 .04 E 
0, -1 .05 -i 
0,0 .25 0 
0,2 .20 1 
2, 1 .04 E 
2,0 .20 1 
2 


16 
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In Table 32, we list all possible samples of size n = 2 taken with re- 
placement from this population, the probability of obtaining o 
sample, and the corresponding value of the sample mean X. We 
thus find the following probability table for X, the sample mean: 


And the reader can again verify that u g, the mean of X, and o, the 
variance of X, are given by 


Bg = .7, ox = .605. 
We observe that 


i.e., the mean of the sample mean X is equal to the population mean 
of X, and the variance of the sample mean is equal to the population 
variance of X divided by the sample size. In other words, although 
the values of X and of X have the same average, the values of x 
spread less about this common average than do the values of X- 

A similar procedure for samples of size n = 3 (there are now 27 
possible samples) yields the following probability table for the sample 
mean X. (Note that we do not complicate our notation by explicitly 
indicating the sample size When we write the symbol for the sample 


mean. It is therefore important to keep the sample size clearly in 
mind when writing X.) 


We compute the mean and variance of X and find 


ig =.1, oF = .403---, 
80 that for samples of size 3, 


2 
c 
Ly — py and of =~ 
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We see that X and X again have the same mean, but as compared 
with the values of X (which can be considered values of X for samples 
of size 1) or the values of X for samples of size 2, the values of X for 
samples of size 3 show less spread about the common mean px. This 
fact. corresponds to our intuitive feeling that we improve our estimate 
of the population mean as we take averages based on larger and 
larger samples from the population. We return to this point and 
make it precise in the theorems that follow. 


With the results of this example before us, it should come as no 
surprise that the following general theorem holds. 

Theorem 5.7. Let n be any positive integer and let X3, X5; y X, 
be n independent, identically distributed random variables, each with 
mean px and variance ox. If 


we Kit Mets +X 
-— dn n á 
then 

2 ok 
(5.22) py — ux and ox =~" 


In words, for sampling with replacement from a population given by 
the probability function of a random variable x, the mean of the 
sample mean X is equal to the population mean of X, and the variance 
of the sample mean is equal to the variance of X divided by the sample 


size, 
Proof. We first apply (5.4) to obtain 
eS) = ignc sss cb X) 


ug-E A 


1 = 
= Lpg + + EED] = p n = BF 
Similarly, applying (5.11) with q = °° = & = 1/n, we find 
iaces DER 
o% = Var eS 


= ES [Var(X) + + Var(X;)] 
n 
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Note that the standard deviation of the sample mean is the qe 
ard deviation of X divided by the square root of the sample size: 


E EM 

(5.23) ox = E 

Thus, as the sample size n increases, the values of the sample mean 
X tend to become more concentrated about the mean px. 

Observe that Theorem 5.7 was proved without finding the p 
bility funetion of X. For applications of this theorem to; mn 
problems, it becomes important to know more about X than : 
mean and standard deviation. Unfortunately, we must leave e 
interesting matters at this point, for they lead to probability pro 
lems that cannot be formulated using finite sample spaces. ü 

We can however use Theorem 5.7 to prove the following resu 
which is a special form of the so-called law of large numbers. 


, iable 

Theorem 5.8. Let a population be specified by a random Lupe 

X with mean ux and standard deviation 7x. Let X be the mean o te 
random sample of size n drawn with replacement from this pop 


4 i t 
lation. Let c be any positive number. Then as » increases withou 
bound, 


(5.24) Plux —e< X < p c) 
approaches 1. In other word 
ciently large, the probability 
fers from the population mea; 
as we like. Or, more colloqui: 
please: by choosing the sam 
sure as we like (short of cert 
will be as near the populati 


Proof. To the rando: 
(3.15) and find that 


S, by choosing the sample size 7 wit 
that the value of the sample mean wA 
n by at most c can be made as close m 
ally, since c can be taken as small as we 
ple size sufficiently large, we can be as 
ainty) that the value of the sample mean 
on mean as we like, 


m variable X we apply Chebyshev's inequality 


mea 2 
P(X — us| > o) E 


We use (5.22) to write B 


r d 
x and o in terms of the population mean an 
variance. Then 


2 
P(X — us| >) a 
Hence 


2 
(5.25) P(X — ux Se) > 1 — E 
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But as n increases, the quantity cX/nc? decreases and approaches 


2 
a 
zero. Hence 1 — > x approaches 1 as n gets larger and larger, and so 


P([X — ux| € o) which is just the probability in (5.24), can be made 
as close to 1 as we like by choosing ? sufficiently large. This com- 
pletes the proof. 


Example 5.7. Let the value of X corresponding to each person in 
a certain population be that person's annual income in thousands of 
dollars. Suppose ux — 6.5 and ex — 2.1. A random sample of 7 
persons is drawn with replacement from this population and a value 
of X, the average income of these n persons, is obtained. We want the 
probability to be greater than 9 that this value differs from the 
population mean by at most .5. How large must the sample size 7 


be? 
We seck the smallest value of n for which 
(5.26) P(X — nx < 5) > 9. 


Putting c = .5 and ex = 2.1 in (5.25) we find 
a 17.64, 
P(IX — px| <5) > 1774 
Hence (5.26) will be true if 17.64/n is less than lor ifn > 176.4. 
The desired closeness of the sample mean income and the population 
mean income is therefore achieved with a sample of size 2 = 177. 
(This is a most conservative figure, since it applies no matter what 


probability function X has. In more advanced work, one derives the 


approximate form of the probability function of X, and it is then 
50 will suffice in this 


possible to show that a sample of size n = 
example.) 


We conclude by showing how the law of large numbers can be used 
to supply a theoretical counterpart to our intuitive feeling that if an 
event A occurs f times in ” identical trials and if nis large, then f/n, 
the proportion of times A occurs, should be near the probability P(A) 
of the event A. We let the random variable X have the value 1 if 
the event A occurs, and have the value 0 otherwise. Thus X has the 


following probability table: 


z 0 1 


P(X = 2) 1 — P(A) P(A) 
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We note that ux = P(A). Also X, the mean of a random T 
size n drawn with replacement from the population specified Aon 
random variable X, is just the proportion of times the event E — 
(For X; + --- + X, is the number of times A occurs and a a a : 
this number divided by n.) According to Theorem. 5.8, by S M $ 
sufficiently large, the probability can be made arbitrarily 4 ooh 
that the proportion of times A occurs will be as close as ee 
the probability of A. (This fact, due to James Bernoulli, ae deren 
to 1713.) Tt is in this form that we find support for the — dor 
of probabilities as proportions in a large number of repeate 
pendent trials.* 


PROBLEMS 


5.1. Suppose X and Y have the following joint probability table: 


(a) Determine the probability function of X + Y, and thus compute 
E(X + Y). Check your answer by using (5.2). 

(b) Determine the probability function of XY and thus compute 
E(XY). Then check your answer by using (5.1) to find E(XY) 
directly from the joint probability table. 

(c) Show that (5.6) is true but that X and Y are dependent. 


5.2. Let X and Y have the joint probability table given in the age 
problem. In each of the following parts, a function z is defined T 
giving its value z(z, y) for any real numbers x and y. Determine n 
probability function of the random variable z(X, Y) and calculate 
E[z(X, Y)] from the probability function, and also by using (5.1). 

* There are other interpretations of 
sions in L. J. Savage, The Foundations 
1954, especially pp. 3-4, 56-68, and in E 
bility, International E 
Chicago Press. 1929 


probability. See for example, the discus- 
of Statistics, John Wiley and eee Mi 
- Nagel, Principles of the Theory of I "pend z 
ncyclopedia of Unified Science, Vol. 1, No. 6, University © 
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5.3. 


5.4. 


5.5. 


5.6. 


5.7. 


5.8. 
5.9. 


=i _fa ife<y 
(a) z(z, y) = min(z, y) { ð Tad 
b = maxi -d4u tezy 
(b) z(z, y) = max(z, y) { AE 


(c) z(x,y) = z/y 

(d) z(z, y) = v/z 

(e) z(z y) = z*d- 9? 
( z(m y) = Vr +y 


Which of the following are true for all random variables X and Y 
defined on a sample space? For those that are true for some but not all 
X and Y, find a pair X, Y for which the statement is true and another 
pair for which it is false. (Cf. the preceding problem.) 


(a) E[min(X, Y)] = min{Z(X), E(Y)] 

(b) E[max(X, Y)] = maxLEQO, E(Y)] 

(c) E(X/Y) = EQO/E() 

(d) E(X/Y) = 1/E(Y/X) 

(o) E(X? + Y?) = EX?) + EQ?) 

( (BV Xt + YSP = E(X + Y) 

In the text, corollary (5.7) is proved using the identity in (5.5). Prove 
the corollary without using this identity. (Hint: Use Theorem 43.) 


X has mean 50 and standard deviation 12. Y has mean 30 and standard 
deviation 5. X and Y are independent. Find the mean and standard 
deviation of (a) X 4- Y, (b) X — Y, (c) 3X + 2Y. (Note: ox+y = 
ox + oy.) 

Start with the definition of Var(X) given in (3.6) and use the theorems 
of the present section to prove that Var(X) = E(X?) — px. (Cf. Theo- 
rem 3.2 and its proof.) 

HD = BEF b established in Section 2 


(a) Interpret the result Z( a : 
m 5.2. What then is the random vari- 


as a special case of Theore 
able Y? 

(b) Interpret the result E(aX) = aE(X) established in Section 2 asa 
special case of Theorem 5.4, What then is the random variable Y 
and why (as needed to apply Theorem 5.4) are X and Y inde- 


pendent? 
Prove Theorem 5.1. 
Generalize Theorem 5.4 by proving that if Xy, Xs +++, X, are inde- 
pendent (n any positive integer), then 
E(XiXi Xn) = E(X)EQO) «++ E(X;). 
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5.10. Let Vi, V», V; be independent random variables. Define X = V, + V2 
and Y = V, + Vs. Show that 


E(XY) — E(X)E(Y) = Var(V)). 


5.11. (a) Let Xi, Xs, ---, X, be independent random variables. Show that 
if k is any positive integer less than n and Y, = aX, + +++ +a:Xt, 
then Y; and Xz+ are independent. 

(b) Prove Theorem 5.6 by mathematical induction. 


5.12. A random sample of size n is drawn with replacement from a population 


and we find the sample mean X has mean Lx and standard eed 
ox. What happens to us and ox if the sample size is quadrupled? 


5.13. Let a population be specified by the following probability table of the 
random variable X: 


(a) Find ux and ox, 


the population mean and standard deviation. 
(b) List all possible sai 


mples of size 2 drawn with replacement from ee 
population and determine the probability function of X, the sample 


mean for samples of size 2, Compute us and a% from this prob- 
ability function, and thus check (5.22). 


(c) Repeat part (b) for samples of size 3. 
5.14. Verify formulas (5.17)-(5.19). 


5.15. The incomes of ten people are given in the following frequency table. 


Number of People 


Income with This Income 
Ti 


$3500 
5000 
7500 
9000 


(a) Use formulas (5.17)-(5.19) to compute ux and o%, the mean and 
variance of the incomes in this population. (Note: The computa- 
tions are simplified if you “code” the incomes, for example, by 
letting Y = (X — 5000)/500. The three people with $3500 in- 
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5.16. 


5.17. 


comes (X-values) are thus given coded incomes (Y-values) of —3, 
the four people with $5000 incomes are given coded incomes of 0, 
ete. Since the Y-values are small numbers, it is relatively easy to 
compute wy and of by using (5.17)-(5.19) with x; replaced by y;. 
By using the coding equation relating Y and X, you can easily 
find ux and o% from py and o$-) 

(b) Determine the probability function of X, the sample mean, based 
on samples of size 2 drawn with replacement from the population 
of ten people. Then compute ps and o, and thus check (5.22). 


Two samples of size nı and m» respectively are drawn with replacement 
from a population specified by the probability function of a random 
variable X with mean ux and standard deviation ox. Let Y, and X; 
denote the respective sample means and suppose these means are inde- 
pendent random variables. Show that 


E(X, — X) = 0, 


X y= gla. 
Var(X, — X3) = ox b + - 
Let X; X» +++, X, be independent, identically distributed random 
variables, each with mean ux and standard deviation ox. (That is, a 
sample of size n is drawn with replacement from a population given by 
the probability function of X.) In the text we defined the random 
variable X, the sample mean, and found its mean and variance. The 
sample variance can also be defined. It is a random variable, denoted 
by S*, and given by 
gu. z (X, — Xy. 


n — lx 


To find E(S?) proceed as follows: 
(a) By writing X, — X = (Xi — ux) — (X — ux), show that 


E eu 1 
n—I1k 


(b) Show that 
ELX: — wx)?] = Var(X), — E((X — ux)?] = Var(X). 
(c) Conelude that 


$ Qu = wx) = IG a 


EÇS’) = o, 


that is, the mean of the sample variance is equal to the population 
variance. 
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6. Covariance and correlation; the sample mean (cont.) 


Suppose we are given the joint probability table of two random 
variables X and Y defined on the same sample space S. Each of these 
variables has a mean and a variance, but the joint probability table 
is not needed to compute Ux, ox, and uy, o3; these numbers are de- 
termined by the (marginal) probability functions of X and Y. In 
this section, we define some numbers that measure how the possible 
values of X are related to the possible values of Y ; such numbers 
will depend upon the joint probability function of X and Y. 

We are led to our first definition by reviewing the proof of Theorem 
5.5. There we showed that 
(60) Var(X + Y) = Var(X) + Var(Y) + 2E((X — ux)(¥ — a; 
and since X and Y were assumed to be independent, we invoked 
(5.7) to conclude that the last term in (6.1) vanishes. But (6.1) Isa 
result worth having, since it holds for dependent as well as inde- 
pendent random variables. We therefore want now to study the last 
term in (6.1), a term that we so hurriedly skipped over in the pre- 
ceding section. As usual, a special symbol and name are introduced. 

Definition 6.1, Let X and Y be random variables defined on & 


sample space S. The covariance of X and Y, denoted by Cov(X, Y); 
is defined as the number given by 


(6.2) Cov(X, Y) = E((x — ux)(Y — uy)] 
or equivalently, because of the identity in (5.5), 
(6.3) Cov(X, Y) = E(XY) — uxyy. 
Using this notation we rewrite (6.1) an 
(6.4) Var(X + Y) = 
Let us also note here tha 
cally, i.e., 
(6.5) Cov(X, Y) = Cov(Y, X), 
and since X — ux and Y — Ly each 
(6.6) 


d obtain 
Var(X) + Var(Y) + 9 Cov(X, Y). 
t Definition 6.1 treats X and Y symmetri- 


have mean Zero, 
Cov(X — ux,Y — uy) = Cov(X, Y). 

Many problems require a generalization of (6.4) to more than two 
random variables, For example, by the definition of variance We 
find 
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Var(X: + X: + Xs) = E(Xi + X: + Xs — E(X: + Xs + X3). 
Tf we use (5.4) and write u; for E(X;), then 
(6.7) Var(Xi TX X3) 
= E(Q — m) + (X: — m) + (Xs — u)]). 

Let us now recall from algebra (or by using the multinomial theorem 
of Chapter 3) that 

(ài + az + a3)? = ai + a3 + a3 + 2a: + Zayas + 25a; 
T aj +2 


j=l ail jk 


where the last sum is understood to include the (2) = 3 cross- 
products aja; with 1 X j « k € 3. Applying this to (6.7) by putting 
a; = X; — yj, we find 
Var(X, + Xs + X3) 
3 
= ELD (X; — m) - 2 Z (Xi— u3J(X« — u2]. 
j-1 ati 
jk 


Now using (5.4) again and the definition of variance and covariance, 
We obtain 


(8) Varı + Xi X) = B Var(X) +2 B Cov(X;, X9. 
g= alljnA 
j€k 


The derivation just completed, if applied to the sum of n randorn 
variables, leads to the following general result. We leave writing out 
the proof as an exercise for the reader. 


i Theorem 6.1. If Xi, Xs, ---, Xn are any random variables (n > 1), 
then 


(6.9) Var + + X,) = i Var(X,) +2 X Cov(X, X), 
= all j,k 


i<k 


the last sum including the (2 terms Cov(X;, X4) with subscripts 


Satisfying 1 <j < k <n. 
. In the preceding section (pp. 221-226) we discussed some ques- 
tions involving sampling with replacement from a population speci- 


fied by the probability function of a random variable. In particular 
We derived Formulas (5.22) for the mean and variance of the sample 
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mean X based on random samples of size n drawn with replacement. 
We are now able to prove the corresponding results for the important 
case where random samples of size n are drawn without replacement 
from a finite population of N elements. 

We suppose, as before, that we have a population of N chips and 
that each has a number on it. Let tı, 2», ---, zy; be all the different 
numbers, and suppose x; appears on f; chips for j = 1, 2, --:, M. 
Then, precisely as in (5.16)-(5.19) of the preceding section, we define 
the random variable X and the population mean ux and population 
variance ot. 

From this population of N chips we draw one chip at random, then 
another at random from the remaining N — 1 chips, and so on until 
a random sample of size n (n < N) is drawn without replacement 
from the population. We again let X; be the random variable whose 
value is the number on the kth chip drawn. Thus the sample mean 
X is given by (5.20), as before. Our task is to compute the mean and 
variance of X. 

The random variables X;, X», ---, X, are independent when the 
sample is drawn with replacement. What makes our present analysis 
more complicated is the fact that, in sampling without replacement, 
these random variables are dependent. For example, we have 


(6.10) P(X, =z) = A forj = 1,2, --., M, 


but knowing the outcome of the first draw changes the probability of 
getting the number z; on the second draw: 


eso P esed f s fi—. 

(6.11) P(X: = z;|Xı = 2) = xz y PŒ: = aX = z) = vol 
But, in spite of being dependent, the random variables 
Xy, Xs, +++, Xn are, as in the preceding section, all identically dis- 


tributed with probability function the same as that of X; i.e, for 
k = 1, 2, ---, n, we have 


fi , 
(6.12) P(X,—z)-'y  forjo12, M. 
This means that in the absence of information about the preceding 
draws, the probability that the kth draw results in a chip bearing 
the number Ti is just the proportion of chips bearing this number 
among all chips in the population. For the first draw this is clear an 
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recorded in (6.10). We now prove it is true for the second draw. By 
Formula (II.6.1), 


M 
Psw) = Z,PQ -zjX,-z)P(X: = zi). 


Because of (6.11), we must isolate the term with b = j. Then 
P(X: = 2j) = P(X: = zj[Xi = 2)P(Xs = 2j) 
M 
+ D* P(X: = 2|X1 = z)P(X:- c), 
k=1 


where the asterisk means that the sum does not include the term with 
k = j. Continuing, 


= fi f 
PX: = z) = = f ii i 
di re SE f 
hrun Le Eth) 
fi - M 
lcd ( 1+ Z4) 
= y nw oy — 1) =b, as claimed. 


We leave the proof that (6.12) is also true for k = 3, 4, -+ +, n for the 
problems. 

From (6.12) it follows that E(X;) = ux for k = 1, 2, «++, n, and 
Se we apply (5.4) to obtain 


ug = p (4 = tA) - iue) F4 E(X,)] = ux, 
as in the case of sampling with replacement. The fact that 
Xi, X, «- *, X, are no longer independent does not influence the 


calculation of us, since (5.4) holds for dependent as well as inde- 
pendent random variables. 
It is in the caleulation of the variance of X that the dependence of 
e Xy's complieates matters. We must use (6.9) and so need to 
compute Cov(X; X;). (We know that Var(X;) = oi, as given in 
(9.18), since Y; has the same probability function as X for 
J=1, 2, ---, n.) A saving grace is the fact that 


Cov(X;, X;) = Cov(Xi, Xə) forallj z k. 


This eguality follows from the observation (sce Problem 6.6) that 
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each pair of random variables taken from Xi, X. 2 «++, Xn has the 
same joint probability function as any other pair. It therefore suffi- 


ces to compute Cov(X;, Xs), and it is to this task that we now turn 
our attention. 


By the definition of covariance, we have 
Cov(X;, X;) = E[(X1 — ux)(X« — py)] 


= E (zj— ux)(zx — ux)P(Xi = tj, Xo = 2j). 
all j,k 


Again we isolate the terms with j = k and indicate their absence by 
placing an asterisk on the summation symbol. Then 


M 
Cov(X;, Xə) = X (x; = Bx) P(Xi = Tj, Xo = 2) 
j- —— 
+ zt (z; — wx) (ex — ux)P(X1 = tj, Xa = tr) 
alj, ik 


M 


T — panel 
(6.13) = PA (zx; — ux NOT à ; 
u^ (c; — px) (t — ux) NN-1 


To evaluate this last sum, we use the following device. Since 
E(X;) = ux, we know that E(X; — ux) = 0, i.e., 
M 
2, (t; — ux)f; = 0. 
Now square both sides of this equation to find 
M 
PECES ux); + Z* (a; — Bx) (te — ux) fife = 0. 
j=1 all j,k 


The second sum is therefore th 
(6.13) and obtain 


Cov(X;, Xə) 


e negative of the first. We use this in 


P} a M , 1 M _ or? 

= NWN -DEG-suMi-)- NN I (æ, — ux 
1 M " 

~~ NW — 1,2, © e. 

"EN. 

= = N x» T 


the final equality followin; 


g from (5.18). Now at last we can apply 
Formula (6.9) to find a: 
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ot = Var = an r^ +X, 


Ll 2 x ox 
| Fy = 
| jo 7* TÀ EN N = 1] 
j«k 


- sw +2 Hi (- s i) 


) z L Tat oce A 


With this lengthy calculation, we have completed the proof of the 
following important theorem. 


Theorem 6.2. From a population of N elements with mean ux and 
variance o%, a random sample of size n is drawn without replacement. 
Let X be the sample mean. Then 


ox (N —m 
(6.14) ug — ux and of = = Ga 


Before discussing these results we give a numerical example. 


Example 6.1. Suppose we have a population of N = 5 people and 
know the IQ score of each. Table 33 summarizes the available infor- 
mation concerning the population. By using Formulas (5.17)-(5.19), 


TABLE 33 


Number of People 
IQ Score with This IQ Score 


Ti 


80 
100 
130 


"is reader can check that for this population the mean IQ score and 
e variance of the IQ scores are 


(6.15) ux — 108 and o% = 376, 
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so that the standard deviation of IQ scores in the population is 
ox = 19.4, approximately. 

A random sample of n = 2 people is drawn without replacement 
from this population. In Table 34 we list all possible results of this 
sampling experiment, together with the information needed to de- 


TABLE 34 


Number of Ways 
IQ Score of IQ Score of of Drawing These Value of X 
First Person Second Person Scores in the for This 
Selected Selected Stated Order Sample 
80 100 2 90 
80 130 2 105 
100 80 2 90 
100 100 2 100 
100 130 4 115 
130 80 2 105 
130 100 4 115 
130 130 2 130 


termine the probability function of X, the sample mean. Note that 
we are selecting two people from five people, not two IQ scores from 
the three different IQ scores. This means, for example, that we can 
get the scores 80 and 100 in that order in two ways, since two people 
have IQ scores of 100. There are altogether 5-4 = 20 ways of select- 
ing first one person and then another from the population. Thus the 
numbers in the > third column of Table 34 add to 20. 

We see that X has the value 90 for four of the 20 samples, so that 
P(X = 90) = zy = .2. In this way, we obtain the following proba- 
bility table for X, the sample mean: 


And the reader should now compute the mean and variance of X from 
the table. He will find that 


(6.16) Ly = 108, ct = 141, 
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Note that we have here a particular example to which ‘Theorem 
6.2 applies. The results in (6.16) can be checked against those pre- 
dicted by the theorem. The values of ux and o& are given in (6.15) 
and with N = 5, n = 2 we find from (6.14) that 


uz = 


D 


as expected. 


Comparing the results obtained in Theorem 5.7 (sampling with ro- 
placement) and Theorem 6.2 (sampling without replacement) we 
draw the following conclusions. 

(1) In both sampling with and without replacement, the values of 
the sample mean X have the right "aim" in the sense that their 
average value us is equal to the population mean px. 

(2) In both sampling with and without replacement, if the sample 
Size is greater than 1, then the values of X show less spread than the 
values of X about their common mean px; i.e., 


& «oe ifn>l. 


G] 


This follows by observing that c% is obtained by multiplying c by 
the factor 1 in one case, by the factor 1 I 1 in the other case, and 
both factors are less than 1 if n > 1. 

(3) In both sampling with and without replacement, the sample 
Variance o% decreases as the sample size n increases. For the 
factors mentioned in (2) are not only less than 1, but also decrease as 
n increases, Furthermore, in sampling without replacement, if the 
Sample exhausts the population (n = N), then o% = 0. For if we 
draw into the sample all the members of the population, then all 
Samples differ only in the order in which members are drawn. Hence 
all samples have the same mean; i.e., there is only one possible value 
of X. In this case we know that the variance of X is zero. 

(4) When sampling from the same population and for samples of 
xed size n > 1, o4 is smaller when the sample is drawn without re- 
Placement than when it is drawn with replacement. For the vari- 


ances differ by the factor x = i which is less than 1 when n > 1, 
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(5) Also, since 


N-n N, 
N-1 Fa 
1 N 


this factor is close to 1 whenever N is very large compared to n. For 
then n/N and 1/N are close to zero. Thus, if samples of size are 
drawn without replacement from a population of N elements, and if 
the population size is very large compared to the sample size n, then 
o% is approximately equal to cX/n. This accounts for the fact that 
the simpler formulas (5.22) are often used in statistics when N is 
very large compared to n, even though the sample is drawn without 
replacement. : 
We cannot here go into the question of how to use our results (in 
conjunction with other information about the sample mean X, es- 
pecially about its probability function) if just one sample is drawn 
from a population and we then want to use the sample values of 
Xi, Xs, +--+, X, to make inferences about the population. This ques- 


tion is of great practical importance and is discussed in detail in 
statisties textbooks and courses. 


From the definition of Cov(X, Y) in (6.2), we can see that the co- 
variance is a measure of the extent to which the values of X and Y 
tend to increase or decrease together. If X has values greater than 
its mean px whenever Y has values greater than its mean py and X 
has values less than ux whenever Y has values less than py, then 
(X — ux)(Y — uy) has positive values and Cov(X, Y) » 0. On the 
other hand, if values of X are above ux whenever values of Y are 
below uy and vice versa, then Cov(X, Y) « 0. If X and Y are inde- 
pendent, then we know by Theorem 5.4 that Cov(X, Y) = 0. . 

By a suitable choice of two random variables, we can make thoir 


covariance any number we like. For example, if a and b are con- 
Stants, then 


Cov(aX, bY) 


E(aXbY) — E(aX)E(bY) 
abE(XY) — (aux)(buy), 


from which it follows that 


(6.17) Cov(aX, bY) — ab Cov(X, Y). 


It is now clear that if Cov(X, Y) z^ 0, then by varying a and b we 
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can make Cov(aX, bY) positive or negative, as small or as large as 
we please. 

It is more convenient to have a measure of relation that cannot 
vary so widely. We shall prove shortly that the covariance of X* 
and Y*, where X* and Y* are the standardized random variables 
corresponding to X and Y (as defined in Theorem 3.4), can vary only 
between —1 and +1. 

Now by (6.17), 


Cov(X*, Y*) = Cov [ "Ha E “) 


oy 


a tay qu F d) 
oxy 
_ Cov(X, Y). 
BET 
this last equality following from (6.6). We are thus led to the follow- 
ing definition. 

Definition 6.2, Let X* and Y* be the standardized random vari- 
ables corresponding to X and Y. The covariance of X* and Y* is 
called the correlation coefficient of X and Y and is denoted by p(X, Y). 
In symbols, Cov(X, Y) 

6. = ek. Ys) = SOM 62, 

(6.18) p(X, Y) = Cov(X*, Y*) a 

If cx = 0 or if cy = 0, and consequently (6.18) does not apply, we 
define p(X, Y) = 0. The random variables X and Y are said to be 
uncorrelated if and only if p(X, Y) = 0; otherwise they are said to be 
correlated, 


Tf ox > 0 and cy > 0, then the correlation coefficient p(X, Y) is 
Zero if and only if Cov(X, Y) = 0. In the exceptional case when one 
or both of the random variables have standard deviation Zero, we 
know (sec Problem 4.10) that X and Y are independent and hence 
Cov(X, Y) = 0. Thus p(X, Y) = 0 and Cov(X, Y) = 0 are equiva- 
lent conditions: X and Y are uncorrelated if and only if their covari- 
ance is zero. 

Before commenting on this definition, let us see how to compute a 
Correlation coefficient. 


Example 6.2. Let X and Y be random variables with joint proba- 
bilities as given in Table 27 on p. 199. In Example 5.1, we found that 
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ux = $, uy = $, and E(XY) = 1. Thus Cov(X, Y) = 1 and i 
know that X and Y are correlated. We leave for the reader the we 
fication that E(X?) = 4 and E(Y?) = 3 so that o = 1 and o} = $ 
We now apply (6.18) to find 


p(X, Y) = ——— = whl .58, approximately. 


Example 6.3. In Example 5.4, we defined two random variables 
X and Y and found that they were functionally dependent (Y = X°), 
but that (5.6) was true; i.e., Cov(X, Y) = 0. We therefore ji s 
example of random variables that are uncorrclated but nol independ " 
We conclude that one must exercise great care in interpreting the 
covariance or the correlation coefficient as a measure of relationship 
between values of X and Y. In particular, the fact that the cor- 
relation coefficient is zero does not mean that X and Y are unrelated, 
for we have just seen that p(X, Y) = 0 but X and Y are as strongly 
related as they can be: knowing the value of X we are certain of the 
value of Y, since Y = X?. 


Although it is merely a rephrasing of Theorem 5.4, we emphasize 
the point made in the last example by the following statement. 


Theorem 6.3. If X and Y are independent random variables, then 
they are uncorrelated, but not conversely. 


We turn now to some properties of the correlation coefficient. 


Theorem 6.4. The correlation coefficient of X and Y is a number 
between —1 and +1 inclusive, i.e., 


(6.19) —1 0(X%, Y) <1, 

Proof. Consider the varian 

the standardized random va: 

spectively. By (6.4), 

Var(X* + Y*) 

But Var(X*) = 
nition. Hence 


(6.20) 


ce of X* + Y*, where X* and Y* are 
tiables corresponding to X and Y, re- 
= Var(X*) + Var(Y*) + 2 Cov(X*, Y*). 

Var(Y*) = 1 and Cov(X*, Y*) = p(X, Y) by defi- 


Var(X* + Y*) = 21 + p(X, Y)]. 


Since Var(X* + Y*) > 0, it follows that —1 < p(X, Y). Similarly; 
the reader can show that 
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(6.21) Var(X* — Y*) = 2[1 — p(X, Y)] 


from which we conclude, again since the variance is nonnegative, that 
p(X, Y) < 1. Thus the theorem is proved. (For another proof see 
Problem 6.16.) 


Tt is important to understand the meaning of the extreme values 
p(X, Y) = +1. Now the strongest relation exists between X and Y 
When the value of Y is uniquely determined as soon as the value of X 
is known. In such a case, Y is some function of X, say Y = g(X). 
This situation exists whenever each row of the joint probability table 
of X and Y has all entries but one equal to zero. As we saw in Ex- 
ample 6.3, Y can be a function of X and yet p(X, Y) = 0. In that 
example, Y = X*, a quadratic function of X. But if Y is a linear 
funetion of X. , then we can prove that p(X, Y) must have one of its 
extreme values. And we shall also be able to prove that conversely, 
if p(X, Y) = +1, then X and Y are linearly related. Before stating 
and proving these results, let us look at an example. 


. Example 6.4, Suppose the random variables X and Y have the 
Joint probabilities given in Table 35. 


TABLE 35 


P(Y =y) 


Since each row contains exactly one nonzero entry, we know that Y 
is some function of X. We also observe that all nonzero probabilities 
occur on the diagonal along which the values of Y increase as the 
Values of X decrease. On the probability graph in Figure 26 we see 
that the points (x, y) at which positive probabilities are indicated all 
lie on the dotted straight line with negative slope. In fact, the joint 
Probability table was constructed assuming Y is the linear function 
of X given by Y = —2X + 7. But let us calculate the correlation 
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P(X= x, Y= 9) 


Figure 26 


coefficient of X and Y without using this fact. It is easy to find 
directly from Table 35 that 


ox=.7, oy=14, Cov(X, Y) = —.98. 
Hence 


Theorem 6.5. Let X be a random variable defined on a sample 
space S and suppose ex > 0. Let Y be a linear function of X; i.c. 
Y = mX +b, where m and b are numbers and m # 0. Then 
P(X, Y) = +1 if m > 0 and p(X, Y) = —1if m <0. 


Proof. Since Y = mX +b, we have Hy = mpx + b. Hence 


Y — uy = m(X — ux), from which it follows that 
Cov(X, Y) = E[m(X — ux)] = mo. 


Also we know from (3.12) that cy = [m|ex. Thus 


KX, Y) Cov(X, Y) _ mok _ m. 
oxoy ex|m|ex — |m| 


Since m/|m| equals 1 if m > 0 and equals —1 if m « 0, the proof is 
complete. 


Before we can prove the converse of Theorem 6.5, we must be 
careful to isolate a minor difficulty. We are going to want to prove 
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that if p(X, Y) = +1, then Y is a linear function of X; i.e., 
Y = mX + b for some numbers m and b. What this means is that 
Y(o,) = mX(o,) + b for each o; € S. But this is more than we can 
tightfully expect to prove. For suppose some simple event, say {01}, 
is assigned probability 0. Then we may as well forget about the ele- 
ment 0, for X(o0;) is not one of the possible values of X unless it is 
the value of X for some other element o; e S for which P({o,;}) > 0. 
In any case, the element o; could have been deleted from our sample 
Space, since it plays no role in the construction of the joint proba- 
bility table of X and Y. Thus, changing the value Y(o) cannoi 
change the correlation coefficient of X and Y. This means that our 
best hope is to be able to prove that Y(o) = mX(o;) + b for all 
9i € S except possibly for elements of S that together make up an 
event with probability 0. Let us introduce the following handy ter- 
minology for this state of affairs: We shall say that two random vari- 
ables are equal with probability 1 whenever their values are equal for 
all elements of the sample space S except possibly for elements that 
together make up an event with probability 0. 
Now we can state and prove the converse of Theorem 6.5. 


Theorem 6.6, Let X and Y be random variables defined on a 
Sample space S, and suppose p(X, Y) = +1. Then Y is a linear 
function of X with probability 1. In fact, numbers m > 0, b, and c 
exist so that Y = mX + bif p = --1and Y = -mX + cif p = —1, 
each with probability 1. 

Proof. Suppose p(X, Y) = +1 and proceed from Equation (6.21). 
We see that Var(X* — Y*) = 0 and it follows that X* — Y* has 
One value that occurs with probability 1. This value must be the 
Mean of X* — Y* which is zero since the mean of each standardized 
Tandom variable is zero. Thus X* = Y* with probability 1 or 

X -ux Y-py. 
ox oy 
Simplification yields the desired result Y = mX + b with 


Hyox — EXOY, 


m=2>0 and b= 
ox ox 


The reader can complete the proof if p(X, Y) = —1 by starting with 
(6.20) and proceeding as above. 


As these theorems hint, the correlation coefficient is a meaningful 
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measure of relationship between values of X and Y only when this 
relationship is a linear one. For a fuller understanding of this fact. 
one must study the so-called regression functions of each random 
variable on the other. These functions are defined and some of their 
properties most often used in statistics are stated in Problem 6.20. 


6.1. 
6.2. 


6.3. 


6.4. 


6.5. 


6.6 
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PROBLEMS 


Prove Theorem 6.1. 


From the population of N = 10 people whose incomes are given in the 
frequency table of Problem 5.15, a random sample of size n = 2 is 
drawn without replacement. If X is the sample mean income, then 
determine the probability function of X, calculate us and o%, and thus 
check Formulas (6.14). Compare the results with those of Problem 
5.15, where the sample is drawn with replacement. 


From the population of N = 5 people whose IQ scores are given 1n 
Table 33, a random sample of size n is drawn without replacement. 
Let X be the sample mean IQ score, Determine the probability func- 
tion of X, calculate ug and ož, and check Formulas (6.14) if the sample 
size is (a) n = 3, (b n = 4, (c) n = 5. (The corresponding problem 
with n = 2 was solved in Example 6.1.) 


Refer to Example 5.6 and 
chips with relative frequen 
i.e., one chip has 


Suppose we have a population of N — 10 
cies as given in the probability table of xi 
—1 on it, five chips have 0’s on them, and four chips 
have 2's on them. From this population, a random sample of size ? 
is drawn without replacement. If X is the sample mean, then determine 
the probability function of X, calculate Lg and o% if (a) n= 1, (b) 


n = 2, (c) n = 3, (d) n = 9, (e) n = 10. In cach case check Formulas 
(6.14). 


In a population of 10,000 families, annual income has mean $5000 and 
standard deviation $750. According to Chebyshev’s inequality, within 
what interval will the sample mean X fall with probability at least v 


if a random sample of size 100 is drawn without replacement from the 
population? 


The following parts refer to the text discussion of sampling without 
replacement from a finite population. 


(a) Prove (6.12) for k = 3, 4, ---, n and thus complete the proof that 
Xy, Xs, +++, X, are identically distributed. 
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6.7. 


6.8. 


6.9. 


6.10. 


(b) Determine the joint probability function of X; and Xs, and also 
of X, and X;. What is the joint probability function of X; and X, 
for any j # k? 
- 1 B 
(c) Show that p(Xi, X) = “WaT Is this answer reasonable? 
For sampling without replacement from a population of size N, you 
want to determine a sample size m such that the standard deviation 
of the sample mean is half as big as it is for samples of size n. Show that 


= nN 
N + 3n 


provided the right hand side is an integer. (Note that nı and n are 
equal when n = N, and explain why this is reasonable.) 


7 


Refer to the card guessing experiment described in Example 5.5, but 
now suppose that the subject chooses at random a permutation of the 
numbers 1, 2, +--+, n and then calls his guesses in the order specified 
by the selected permutation. As before, one way of doing this would 
be for the subject to have a duplicate deck. But now he makes his 
guesses by selecting a random sample of n cards, one by one without 
replacement from his deck. 

Let X denote the random variable whose value is the number of 
Correct guesses made by the subject, and thus write 


X=XitX:t e +X, 
where X, has the value 0 or 1 according as the kth guess (trial) is 
wrong or right. 


(a) Show that Xi, X», +-+, X, are not independent. 
(b) Prove that Xi, X», +++, X, are identically distributed, the prob- 
ability of a correct guess at the kth trial being 1/n for k = 1, 2, 


s The 

(c) Show that E(X) = 1, as in Example 5.5. 

P us 

(d) Prove that Cov(X;, X,) = Sh—D 

(e) Show that Var(X) = 1, a somewhat higher variance than in Ex- 
ample 5.5. 


for all j # k. 


Show that all other things being equal, the greater the correlation 
coefficient of two random variables, the greater the variance of their 
sum and the less the variance of their difference. 


The average covariance of Xi, X», +++, X, is denoted by Cova (Xs, +++, 
Xa) and defined as the sum of all Cov(X;, X) with 1 «jk «n 


divided by ( H the number of such covariances, 
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6.11. 


6.12. 


6.13. 


6.14. 


6.15. 


6.16. 
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(2) Show that 


es — 1 7 
Var(X) = 2 Var(X) +H Covas (X, +++) Xa). 


(b) Suppose a number K (not depending on n) exists such that 
Var(X;) < K for all j = 1, 2, ---, n. Suppose further that as n 
increases without bound, Covay(X1, ---, Xn) approaches some lim- 
iting value, say C. Show that then the variance of X also ap- 
proaches C. 


Let X and Y be the characteristic random variables (see the definition 
in Problem 1.14) of events A and B, respectively. Find p(X, Y) and 
determine whether X and Y are independent if 

(a) P(A) = 4, P(A|B) = 4, P(B|4) = 3. 
(b) P(A) = 3, P(A|B) = 3, P(B|A) = 3. 


Let X be the larger of the two numbers and Y be the sum of the num- 


bers showing when two fair dice are rolled. Find p(X, Y). (Cf. Prob- 
lem 4.4.) 


Let X be the number of empty cells and Y the number of objects in the 
first cell when three indistinguishable objects are randomly distributed 
into three numbered cells. Find p(X, Y). (Cf. Problem 4.3.) 


A fair die is rolled two independent times. Let X and Y denote the 
number of points showing on the first roll and the second roll, respec- 
tively. Define U 2 X -- Y and V — X — Y. Show that U and V are 
dependent random variables, but that p(U, y) 20. 


A fair coin is tossed four independent times. Let X be the number of 
heads obtained on the first two tosses and Y be the total number of 


heads. Find the joint probability table of X and Y and compute 
p(X, Y). 


(An alternate proof of Theorem 6.4.) Let U = X — ux and V ^ 
Y — ur. 


(a) Note that y = E([zU + VY) > 0 for all real z. 
(b) Expand and obtain 


y = oXz* + 2 Cov(X, Y)z + cy > 0. 


(c) Interpret the inequality in (b) as showing that a certain parabola 
in the z-y plane lies entirely above the z-axis or has exactly one 
point of contact with the z-axis, 

(d) Conclude that the quadratic equation y = 0 has either no real roots 
or two equal real roots. Hence the discriminant of the quadratic 
equation must be negative or zero. Thus find —1 < p(X, Y) < l- 
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6.17. 


6.18. 


6.19. 


6.20. 


Suppose Y is a linear function of X. Let p, be the value of p(X, Y) 
when Y = mX + b. Draw a graph showing how pm depends upon the 
slope m. (Plot m along the horizontal axis.) 


Let X and Y be random variables and suppose a, b, c, d are any num- 
bers provided only that a ¥ 0, c Æ 0. Show that 


p(aX + b, cY +d) = | p(X, Y). 


(Note: This result shows that the absolute value of the correlation 
coefficient is not altered by a change in location of the origin or a 
change in scale on either z or y axis. This is a property expected of any 
reasonable measure of relationship. For example, the correlation be- 
tween weight and height will have the same absolute value whether we 
measure height in inches, in feet, or in tenths of an inch above or 
below 68 inches. If a and c have opposite signs, then we see that 
p(aX + b, cY + d) = —p(X, Y). Is this change of sign reasonable?] 
Suppose X and Y each have only two possible values. Prove that, if X 
and Y are uncorrelated, then they are also independent. 


The conditional mean of Y given X = 2; is denoted by E(Y |X 2 5j) 
and defined for j = 1, 2, -- +, M by the equation 


N 
EY |X = 2) = 2 wg 22. 
(Note: Conditional probability functions are defined in Definition 4.3, 
p. 206.) 
(a) Show that if X and Y are independent, then 
E(Y |X =2;) = E(Y) forj = 1, 2, +++, M. 
(b) For any random variables X and Y, show that 


BO) = X EY |X = afe) 


(c) The conditional mean of X given Y = yx is denoted by E(X | Y = yx) 
and defined for k = 1, 2, -- +, N by the equation 


M 
EX|Y-2y)- P zif(z; | y). 
je 


State and prove the results analogous to those in parts (a) and (b), 
but now referring to the conditional mean of X given Y = yi. 
(d) The regression function of Y on X is defined as the function whose 
domain is the set of possible values of X and whose value at x; 
is the conditional mean E(Y | X = zj) for j = 1, 2, +++, M. Simi- 
larly, the regression function of X on Y has the set of possible values 
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(e) 


(f) 
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of Y as domain, and its value at y; is the conditional mean 

E(X | Y = y) for k = 1,2, ---, N. The regression graph of Y on 

X is a set of M points in the z-y plane, the point with z-coordinate 

xi having y-coordinate E(Y | X = z;). Similarly, the regression 

graph of X on Y is the set of N points (E(X | Y = yj), yx) for 

k21,2,--., N. 

À regression function is said to be linear if all the points of the 
corresponding regression graph lie on a straight line. Otherwise, 
a regression function is said to be nonlinear. 

(i) For X and Y with joint probabilities given in Table 27 (p. 199), 
show that both regression functions are linear. 

(ii) For X and Y with joint probabilities given in Table 31 (p. 218), 
show that the regression function of Y on X is nonlinear, but 
the regression funetion of X on Y is linear. 

(iii) Construct a joint probability table so that both regression 
functions are nonlinear. 

Suppose the regression function of Y on X is linear; i.e., constants 

m and b exist such that for j = 1, 2, .. +, M, 


ay 
(9 E(Y|X=2) = 2 vive | 2) = mz; + b. 


To evaluate m and b, proceed as follows, First multiply (5) by f 
and add the resulting equations for j = 1, 2, ..., M. Obtain 
uy = mux + b. Then multiply (*) by ajf(a;) and add all M equa- 
tions as before. Obtain E(XY) — mE(X?) + bux. Solve these Si- 
multaneous linear equations and thus determine m and b. Finally, 


show that the linear regression function of Y and X can be written 
in the following form: 


C) BOX = 2) = we + (C Y) t (s; — ps). 
x 

Conclude that the points of the graph of the linear regression func- 
tion lie on a straight line passing through the point (ux, ux) and 
that this line is horizontal if and only if X and Y are uncorrelated. 
The experiment is performed and we are given the incomplete 
information that the value of X is z; We want to estimate the 
value of Y. Suppose we use a “least-squares” criterion; i.e, We 
Seek an estimate, say cj, such that the mean squared deviations of 
values of Y from the estimated value cj will be as small as possible. 
In symbols, we seek the number ¢; which minimizes 


N 
BUY — ¢)?|X = x] = 2 Me — eot | 2). 


Show that this least-squares estimate is the conditional mean of Y 
given X = z;; ie. show that c; = E(Y |X = zj). [Hint: The proof 
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follows immediately from the property of the mean stated in Prob- 
lem 3.8.] 

(g) For any pair of values (z;, y;), the error made by using the estimate 
E(Y | X = zj) in place of y; is the difference y; — E(Y | X = 2j). 
The mean squared error, denoted by c2, is therefore the sum 


of = D [ys — E(Y|X = z)TAG y). 
all j,k 


Show that if the estimate is given by the linear regression function 
(**), then 
e; = (1 — p?)ey, 

and thus conclude that this mean squared error decreases and 
approaches zero as p(X, Y) approaches either +1 or —1. 

(h) State and prove the results analogous to those in parts (e)-(g), but 
now supposing the regression function of X on Y is linear and we 
are given that the value of Y is y; 
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Chapter 5 


BINOMIAL DISTRIBUTION 
AND SOME APPLICATIONS 


1. Bernoulli trials and the binomial distribution 


Certain kinds of experiments and associated random variables gecuk 
time and again in the theory of probability and in its applications. 
They are therefore made the object of special study in which their 
properties are explored, values of frequently needed probabilities are 
tabulated, and so on. In this section, we describe a number of such 
experiments and random variables, paying special attention to the 
so-called binomial probability function. In the final sections of this 
chapter, we discuss two important problems of statistics in which 
this function plays a central role. 

As we have seen in numerous examples throughout this book, many 
problems involve experiments made up of a number, say n, of in- 
dividual trials. Each trial is itself really an arbitrary experiment, 
and is therefore defined in the mathematieal theory by some sample 
Space and assignment of probabilities to its simple events. The ei 
can be independent or dependent, and the simple events of the samp e 
space for the n-trial experiment are assigned probabilities accordingly x 

Although each trial may have many possible outcomes, we al? 
often interested only in whether a certain result occurs or not. For 
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example, a machine turns out parts which are classified defective or 
good; a card is selected from a standard deck and it is an ace or not 
an ace; two dice are rolled and the sum of the numbers showing is 
seven or is different from seven; a student selected from the senior 
class has a part-time job or has not; ete. 

In order to have a convenient standard terminology for discussing 
all such trials, we shall call one of the two possible results of each 
trial a success and the other a failure. Which result is called a success 
is, of course, completely arbitrary—whether one calls a defective part 
a success or a failure, or a student with a part-time job a success or a 
failure, is a matter of taste as far as the theory goes. We must how- 
ever make sure that we are consistent in our language in any one 
problem. 

If when a trial is performed we are interested solely in whether a 
success or a failure results, then it is sensible to make the sample 
Space defining the trial reflect this fact by containing just two ele- 
ments, say S for success and F for failure. If the simple event {S} 
is given probability p, then an acceptable assignment of probabilities 
is determined for every choice of the number p, provided only that 
0 <p <1. Writing g = 1 — p for convenience, we have 


(1.1) PUSY =p, PUP) =a ptg=l. 


As an example, consider drawing a card at random from a standard 
deck. Ordinarily we define as sample space a set containing 52 ele- 
ments (one for each card) and assign probability y to each simple 
event. But if we are interested only in whether or not the card is an 
ace, and we call drawing an ace a success and drawing any other face 
value a failure, then we prefer to use {S, F} as sample space, with 
P = ds and q = 23 as probability of success and failure, respectively. 


Definition 1.1. Trials are called Bernoulli trials (after James 
Bernoulli, 1654-1705) if and only if they meet the following con- 
ditions: 

. (1) Each trial is defined by the sample space (S, F}; i.e., we con- 
Sider that each trial has only two outcomes: either S (success) or F 
(failure). 

(2) The same assignment of probabilities, as given in (1.1), is made 
to the simple events of each trial; i.e., the probability of a success is 
the same on each trial and is denoted by p. 

(3) The trials are independent. 
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A sequence of any (not necessarily Bernoulli) trials can be thought 
of as a process in which outcomes of the individual trials are produced 
as the trials are performed. A process of this kind is called a stochastic 
(= probability) or random process, since the particular sequence of 
outcomes obtained depends upon chance. A random process made 
up of Bernoulli trials is called a Bernoulli process. 


Example 1.1. Tossing a coin 100 independent times is interpreted 
to mean 100 Bernoulli trials in which each trial (toss of the coin) 
results in success (say, heads) or failure (tails), and the probability p 
of a head is the same for all 100 tosses. If the coin is fair, then p = à 
and q = 4; if the coin is biased, then pzi 


Example 1.9. Consider a manufacturing process in which a metal 
part is produced by an automatic machine. Suppose each part in 2 
production run of 500 parts can be classified upon inspection as de- 
fective or good. We can think of the production of a part as a single 
trial which results in success (say, a defective part) or in failure (2 
good part). If we believe that the machine operation is just as likely 
to produce a defective on one trial as on any other, and if we also 
believe that the occurrence of a defective on any trial is made neither 
more nor less likely by the particular results obtained on the preceding 
trials, then it is reasonable to assume that the production run is & 
Bernoulli process with 500 trials. (The probability p of a defective 
on each trial is called the average fraction defective of the process-) 

Of course, the Bernoulli Process is a mathematical idealization of 
the actual production process. For example, if the machine setting 
wears down as the run proceeds, then the tendency of the machine to 
produce defectives will increase as time goes on and the probability ? 
is therefore not the same for all 500 trials. It is clear that a real manu- 
facturing process cannot be exactly represented by a Bernoulli process- 
Nevertheless, it is often closely approximated by such a process, an 
useful results are obtained by means of this idealization. 


Example 1.3, The sample space for an experiment made up of three 
Bernoulli trials with probability p for success on each trial is the 
Cartesian product set (8, F} X (S, F} x (S, F} containing 2° = 8 
three-tuples as elements. These three-tuples and the probabilities of 
the corresponding simple events, obtained by use of the product rule 
of Chapter 2, Section 9 (since the trials are independent), are listed 
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in the first two columns of Table 36. The number of successes ob- 
tained in this experiment, denoted by S;, is a random variable whose 


TABLE 36 
[ Outcome of Probability of Corre- Possible Value of S | P(S; = k) 
Experiment | sponding Simple Event k 

FFF qq =o 0 g 
FFS qap = pg? 

PSF qp) = 4 1 3p 
SFF pag = py 

FSS app = vq 

SPS pap = 4 2 3p'q 
SSF ppa = rq | 

SSS ppp = p 3 p 


possible values are 0, 1, 2, 3. The probability function of the random 
variable S; is determined in the last two columns of Table 36. Note 
that the probabilities in the last column are the terms in the binomial 
expansion of (q + p)*. Since p + q = 1, it follows that the sum of 
these probabilities, as expected, is indeed 1. 


The general argument about to be made is modeled on this last 
example. (The reader may find it helpful at this point to review 
Example 9.5 and Problems 9.1-9.7 in Chapter 2, where other special 
cases were presented.) The sample space for an experiment made up 
of n Bernoulli trials is the Cartesian product set 


(S, F} X {S, F} X +++ X (S, F} 


Containing 2” n-tuples as elements. Every n-tuple represents an out- 
come of the n-trial experiment and is made up of n symbols, each 
an S or an F. Since the trials are independent, the product rule of 
Chapter 2, Section 9 applies and, taking account of (1.1), we deduce 
that the probability of any simple event whose n-tuple contains k S's 
and n — k F's (in any order) is *g7-* for k = 0, 1, ---, n. One such 
^-tuple is determined by selecting the + trials on which S's occur from 


among all n trials. Since this can be done in a different ways, we 
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conclude that there are H n-tuples containing k S’s and n — k F's, 


and that the corresponding simple events all have the same prob- 
ability, namely p*q»-*, 7 
As in Example 1.3, we are interested in determining the probability 
function of the random variable whose value is the total number of 
successes obtained in the n-trial experiment. This random variable 
is denoted by S, and clearly has possible values 0, 1, ---, n. Now 
S; = k, where k is any one of these possible values, is the event for 
which exactly i S’s (and therefore n — k F’s) occur. This event is 


the union of the er simple events determined by n-tuples with I: S’s 


and n — k F's. As we observed, each such simple event has prob- 
ability p*g^-*. Hence 


(1.2) P(S, = k) = I rg  k-0,,--,m. 
We have therefore proved the following result. 


Theorem 1.1. Suppose an experiment consists of n Bernoulli trials 
with probability p for success on each trial. If S, is the random vari- 
able whose value for any outcome of the experiment is the total num- 


ber of successes obtained, then the probability function of S, is given 
by (1.2). 


For given values of n and P, the probability function defined by 
(1.2) is called the binomial probability function or the binomial dis- 
tribution* with parameters m and p. Formula (1.2) thus defines not 
just one binomial distribution, but a Whole family of binomial dis- 
tributions, one for every possible pair of values for n and p. To show 


the dependence of the probabilities on the parameters, we shall write 
b(k|n, p) for the probability in (1.2). Thus 


(1.3) b(k\n, p) = P(S, = k) = Ed pig 


is the probability of exactly k successes, given the parameters n and P 
* For the random variables ci 
distribution and probability 


the stand 


ard terminology, and we use it from now on. Note that one customarily 
Shortens 


binomial probability distribution to binomial distribution. 
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of the binomial distribution; i.e., b(k|n, p) is the probability of ex- 
actly k successes in n Bernoulli trials with probability p for success 
on each trial. The random variable S, is said to be binomially dis- 
tributed with parameters n and p when S, has the probability distri- 
bution defined by (1.3). 

The name binomial distribution arises from the fact that the prob- 
abilities b(k[n, p) for k = 0, 1, ---, n are the terms in the binomial 
expansion of (g + p)". (See Chapter 3, Section 2, for the binomial 
theorem and related identities involving binomial coefficients.) It 
follows since p + q = 1 that 


(1.4) 2, b(k|n, p) = (q+ p^ = 1, 
as required for a probability function. 


Example 1.4. If a fair coin is tossed six times, the probability of 
getting exactly five heads is 


volo, » = (5) G) = gr = 09875. 


The probability of at least five heads is obtained by adding the prob- 
ability of five heads and the probability of six heads. Since 


IV/IVO 4 " 
velo, = (6) (3) (3) = ai = 015025, 


it follows that the probability of at least five heads is 

P(Ss > 5) = b(5|6, 3) + b(66, 3) = 109375, 
where we write S; for the random variable denoting the total number 
of heads (successes) among the six tosses. 


Example 1.5. Five percent of the metal parts produced by a ma- 
chine are defective, the other 95 percent are good. How many parts 
must be produced in order for the probability of at least one defective 
to be 4 or more? We assume that the production of parts is a Bernoulli 
Process for which each trial (producing one part) results in a success 
(defective part) or failure (good part). The probability p for success 
9n any trial is given as p = .05. We seek the smallest integer n such 
that P(S, > 1) > 4. Now 


P(S, > 1) = 1 — P(S, = 0) = 1 — b(0ln, .05) 
1 = (5) cosrcos» = 1 — cosy, 
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so that we want the smallest integer n for which 1 — (.95)" > 4 or 
(.95)" S 3. Using logarithms (Table 22, p. 141) we find n log (.95) < 
—log 2, from which n > 13.5 approximately. Hence n = 14 is the 
smallest lot size that can be used in order to have an even chance or 
better of finding at least one defective part in the lot. 


For many applications, it is necessary to compute the probability 
not of exactly r successes, but of at least r or at most r successes. Since 
such cumulative probabilities are obtained by computing all the in- 
cluded individual probabilities and adding, this task soon becomes 
laborious. For example, to compute the probability of at least six 
Successes in ten Bernoulli trials with p = .3, we must compute 
b(E|10, .3) for k = 6, 7, 8, 9, 10 and then add these five probabilities. 
Fortunately, extensive tables are available to lighten the task of 
Such computations.* 


A small table of binomial probabilities is included here (Table 37) 
for our use. Wherever possible, examples and problems from now on 
will be formulated with numerical values for the parameters n and p 
that will allow the table to be used to find required probabilities. 
Note that we have tabulated P(S, > r), the probability of r or more 
(at least r) successes for n = 1, 2, ---, 10, 20 and for p = .01, .05, 


-10, .20, .30, .40, .50. For each pair of values for n and p and each 
possible value of r we read 


P(S, > r) = Urn, p) + b(r + 1m, p) + --- + B¢nln, p) 
directly from the table. (We do not include a row for r = 0, since 


P(S, = 0) = 1 for all n and p.) We illustrate the use of this table 
in the following examples. 


Example 1.6. In Example 1.4, we computed P(S; > 5) for p 3 
and found the answer.109375. In our table, forn = 6,r = 5, p = -50, 
we read .109, which agrees to three decimals with the exact answer- 
To find P(S; = 5) = b(5|6, 3) we note that 


P($ = 5) = P($ > 5) — P(S > 6), 


* See Tables of the Cumulative Binomial Probability Distribution, Annals of the 
Computation Laboratory of Harvard University, vol. XXXV, Harvard Univer- 
sity Press, 1955; Tables of the Binomial Probability Distribution, National Bureau 
of Standards, Applied Mathematics Series, vol. 6, 1950; H. C. Romig, 50-100 
Binomial Tables, John Wiley and Sons, Inc., 1953. 


TABLE 37. Cumulative Binomial Probabilities 


n 
The entry is P(S, > r) = D b(k|n, p). Missing entries are less than .0005. 
k-r 


n r |p = 01) p = .05 |p = .10 | p = .20 | p = .30 | p = .40 | p = .50 
Eg .010 .050 100 .200 .300 .400 .500 
2|t .020 .098 .190 .360 510 .640 .750 
2 .002 .010 .040 .090 160 .250 
& | i .030 143 271 488 .657 .784 875 
2 .007 .028 04 216 352 .500 
3 .001 .008 .027 .064 .125 
4 [1 .039 185 344 .590 .760 870 938 
2 .001 014 .052 181 348 525 688 
3 .004 027 084 179 312 
4 .002 .008 .026 .062 
5|1 .049 226 410 672 832 922 969 
2 | .001 .023 .081 263 472 .663 812 
3 001 .009 .058 163 317 .500 
4 .007 .031 .087 188 
5 .002 .010 
6 1 .059 .265 469 .738 .882 .953 984 
2 .001 .033 114 345 .580 767 891 
3 .002 .016 .099 256 456 .656 
4 .001 017 .070 A79 344 
5 .002 011 .041 109 
6 .001 -004 .016 
7] 1 .068 .302 .522 .790 918 972 992 
2 .002 044 150 423 671 841 938 
3 004 026 148 353 580 773 
4 .003 .033 126 290 .500 
5 .005 .029 096 .227 
6 .004 .019 .062 
7 002 .008 
$ | 077 .337 .570 .832 942 983 996 
2 .003 .057 87 497 745 894 965 
3 006 .038 .203 448 685 855 
4 .005 .056 494 406 637 
5 010 .058 A74 363 
6 001 011 .050 145 
001 .009 .035 
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TABLE 37. Cumulative Binomial Probabilities (cont.) 


n 
The entry is P(S, > 7) = E b(k|n, p). Missing entries are less than .0005. 
k=r 


n r |p=.01|p = .05 |p = .10| p = .20 | p = .30 | p = .40 | p = -50 
9| 1]| .086 370 613 866 960 .990 .998 
2 | .003 071 .225 564 .804 .929 .980 
3 .008 .053 .262 .537 .768 910 
4 .001 .008 .086 270 517 146 
5 .001 .020 .099 .267 .500 
6 .003 025 .099 .254 
7 004 025 .090 
8 .004 020 
9 .002 
10| 1 .096 .401 651 893 972 994 999 
2 004 086 264 .624 .851 954 989 
3 012 .070 322 617 833 945 
4 001 013 121 .350 618 828 
5 .002 .033 150 .307 .623 
6 .006 .047 166 377 
7 901 | om | .055 | 172 
8 .002 | .012 | .055 
9 .002 011 
10 .001 
20| 1] 182 642 878 988 999 | 1.000 | 1.000 
2 | 017 264 -608 931 992 .999 | 1.000 
3 001 075 323 -794 965 996 1.000 
4 016 | 133 | .589 .893 .984 999 
$ 003 ,| .043 .370 .762 949 994 
6 011 496 .584 874 979 
7 .002 .087 392 .750 942 
8 032 | .228 .584 .868 
9 010 | 113 | .404 | 748 
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since the event (Ss > 5) is the union of the mutually exclusive events 
(Ss = 5) and (Ss > 6). These cumulative probabilities are read di- 
rectly from the table for n = 6, r = 5 and n = 6, r = 6, using the 
column headed p = .50. We find 

P(Ss = 5) = .109 — .016 = .093, 


as compared to the exact answer .09375 computed in Example 1.4. 
The exact answer when rounded to three decimals is .094 rather than 
093 as obtained from the table. Since each cumulative probability 
in the table is itself a rounded figure, such discrepancies are to be 
expected when subtracting tabular entries. 


Example 1.7. To find P(Si0 € 3) when p = .20 we write 
P(Sw < 3) = 1 — P(Sw = 4), 


and read this cumulative probability under p = .20 and in the row 
labeled n = 10, r = 4. We get 


P(Sw < 3) = 1 — 121 = .879. 


Example 1.8. To use the table when p > .50, rephrase the problem 
in terms of q = 1 — p. For example, to find the probability of at 
least seven successes in ten Bernoulli trials with p = .80, we compute 
instead the equal probability of at most three failures in ten trials, 
but now entering the table with the probability appropriate to a 
failure, namely p = .20. This probability was computed in the pre- 
ceding example. 

Note that this method amounts to relabeling the two results of 
each trial so that S and F are interchanged. If the probability of a 
"success" is initially greater than .5, then after the relabeling it is 
less than .5 and the problem is reformulated in terms of this new 
language before using the table. (The formal identity used to justify 
this intuitively clear procedure is given in Problem 1.13.) 


Example 1.9, Two teams, A and B, compete in a series of games. 
Each trial (play of one game in the series) can result in success (say, 
A wins) or failure (B wins). If we assume that the probability p 
that A wins is the same for all games in the series and that the games 
are independent trials, then a Bernoulli process serves as a mathe 
matical model of the series competition. 

The probability p is taken as a measure of the relative strength ot 
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the two teams. If p > 4, team A is better than team B; if p = 3, the 
teams are evenly matched; if p < 3, team B is better than team A. 
How does the kind of series affect the probability of the better team 
to win theseries? For example, a tie at the end of the regular baseball 
season between two National League teams is broken by a three-game 
series in which the team first to win two games is declared the pennant 
winner. The American League breaks ties by having the teams play 
a single game. World Series competition, however, is a seven-game 
series in which the team first to win four games is the winner. We 
feel intuitively that a superior team has a better chance of showing 
its superiority in a seven game series than in a three game series or 
in a single game against the same opponent.* 

Although the World Series ends as soon as one team wins four 
games, we could imagine it continued to the full seven games. Win- 
ning the series is equivalent to winning at least four of the seven games, 
and we can therefore use our table of cumulative binomial prob- 
abilities to compute the probability of a team winning the series for 
various values of p. For example, if p = -30, then we enter the table 
for n = 7,r = 4 and read the probability .126 for team A to win the 
series. If p = .90, then the probability that team A wins the series 
is not directly available from the table. Instead, we read the prob- 
ability that team B wins (entering the table for p = .10 appropriate 
to the new meaning of a “success”) and find .003. Hence, team A 


TABLE 38 


Probability of Team A 
Winning Single Game 


Probability that Team A Wins an n-Game Series 


* For a complete discussion of this and related points, see F. Mosteller, “The 
World Series Competiti 


ion," Journal of the American Statistical Association, vol. 4T 
(1952), pp. 355-380. 
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wins the series with probability .997 if p — .90. In Table 38, we sum- 
marize these computations for various values of p and for series con- 
taining an odd number of games, the winner being required to win & 
majority of the games. These probabilities are graphed in Figure 27, 


Number of games in series 


Probability of winning the series 
o 
a 


A 
0 01 02 03 04 05 06 07 08 09 10 
Probability of winning each game (p) 
Figure 27 


and we ean see how increasing the number of games in the series 
decreases the probability of a poorer team winning (the graphs get 
lower if p « .5) and increases the probability of a better team win- 
ning (the graphs get higher if p > .5). The fact that all five graphs 
are very close together around p = .5 means that, if one team is only 
slightly better than the other (say p — .51), then a nine-game series 
1$ not very much more effective than a single game as a diseriminator 
between the teams. In fact, with p — .51 it turns out that the better 
team Wins a nine-game series with probability .525, which is only 
slightly higher than .51, the probability that it wins a single game. 
Put differently, this means that the poorer team will win the nine 
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game series roughly 47.5% of the time in spite of the fact that it 
faces a superior opponent. Of course, one reduces the probability 
that the series will erroneously be won by the poorer team by increas- 
ing the number of games in the series. (Similar ideas appear in a 
variety of statistical problems, as we shall see in the next section.) 


We turn now to a discussion of some properties of the random vari- 
able S,. In particular, we want to determine the mean and variance 
of this binomually distributed random variable. Since the probability 
function of S, is given in (1.2), we could use the definitions of mean 
and variance given in the preceding chapter to compute Z(S,) and 
Var(S,). We would then have to evaluate the sums 


Q3 ES) = È kmp) = È (p) pg 
k=0 k=0 v 

and 

CO BSH) = È okn ») = (re, 
K-o k=0 k 


from which we compute the variance of S, by use of the formula 
(1.7) Var(S,) = E(S:) — [E(S,)]*. 


This way of calculating the mean and variance of S, is direct and 
offers useful practice in the manipulation of binomial coefficients. 
But we choose to leave this for the problems and instead present an 
alternate derivation which gives added insight into the nature of the 
binomial distribution. 

Suppose (Cf. Section 5 of the preceding chapter) we have a popu- 


lation specified by the random variable X whose probability table is 
as follows: 


(1.8) 


X is here interpreted as the number of successes in a single trial, and 
we simulate a Bernoulli process by drawing random samples with re- 
placement from this population, thinking of the occurrence of a success 
as corresponding to z = 1 and the occurrence of a failure as corre- 
sponding to z = 0. Indeed, if X; is the kth sample value obtained, 
then the sum X;+.-. + X, is precisely the number of ones in à 
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random sample of size n or equivalently, the number of successes in 
n Bernoulli trials with probability p for success on each trial. Hence 
the sample mean X is related to S; by the formula 


(1.9) x ES x or S, = 2X, 


and we ean compute E(S,) and Var(S,) by using 'Theorem 5.7 of the 
preceding chapter, since from (1.9) we have 
(1.10) E(S, = nug and Var(S;) = nor. 

Now the population mean and variance are easily determined from 
(1.8) to be 
(1.11) ux = P, ox = pq. 
Hence 


E(Sn) = nug = nux = np 
and 


2. 
Var(S,) = Woz = n? = = npg. 


We have thus proved the following important result. 


Theorem 1.2, A binomially distributed random variable with pa- 
rameters n and p has mean np, variance npq, and standard devia- 
tion V/npq. 


Example 1.10. In 100 families containing four children, the num- 
ber of families that had 0, 1, 2, 3, 4 girls were recorded as in the follow- 
ing frequency table: 


Number of Girls in Family 


Number of Families 4 31 


If the probability of giving birth to a girl is assumed constant, then 
how can we use these data to estimate this unknown probability? 
We think of the sexes of the children in each family as being deter- 
Mined by four Bernoulli trials with the probability p of success 
(female child) fixed but unknown. Using this theoretical binomial 
distribution, Theorem 1.2 tells us that the mean number of girls in 
a family of four children is 4p. To estimate p we adopt the following 
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procedure: set the mean of the theoretical binomial probability distribu- 
tion equal to the mean of the observed frequency distribution. 

The mean number of girls in the 100 families is 


0(4) + 1(31) + 2(35) + 3(25) + 4(5) 
100 


1.96. 


Hence, according to the estimation procedure just stated, we equate 
4p and 1.96 to obtain 
1.96 


=] = 49, 


EN 


where we write f to denote an estimate of p based on the particular 
data given in this problem. 


TABLE 39 


Theoretically 
Number of Girls “Fitted” Binomial Expected 

in Family Observed Probabilities Frequencies 

k Frequency b(k|4, .49) 100b(k|4, .49) 


.068 6.8 
.260 26.0 
375 37.5 
-240 24.0 

5.8 


The binomial distribution with 
is said to be “fitted” 
the fitted binomial di 
b(k|4, p) and thus the 
for k = 0, 1, 2, 3, 4 a 


parameters n = 4 and p = p = .49 
to the observed frequency distribution. From 
istribution, we can compute the probabilities 
theoretically expected frequencies 100b(k|4, P) 


| ! nd then can compare these with the actually 
observed frequencies to see how good a "fit" we have. The result is 


given in Table 39. How to test the “goodness of fit? between observed 
and theoretically expected frequencies as well as how to appraise the 
given estimation procedure as compared with other possible proce- 


dures are problems of great importance in statistics, but we cannot 
go into these matters here. 


Using Theorem 1.2, the standardiz 


i | ed random variable correspond- 
?ng to S, is seen to be 
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Sn — np 
(1.12 = 
) Vnpq 
The event —c < Sf € cis the same as 
(1.13) ng — cV/npg € S, € np + cV/npq 


and occurs when the number of successes in n Bernoulli trials differs 
from the mean np by no more than c standard deviations. According 
to Chebyshev's inequality, this probability is greater than 1 — (1/c?), 
but we know that this estimate is not very helpful. 

Much stronger results are available. Indeed, it can be shown by 
advanced methods that for large values of n, P(—c € S? € c) is 
closely approximated by the area under a certain bell-shaped curve 
known as the normal probability curve. For example, although 
Chebyshev's inequality tells us only that P(—1 € Sž € 1) > 0, the 
normal curve approximation tells us that this probability is about 
-68 if n is large. Similarly, we learn that P(—2 < Sž < 2) is about 
-95 and P(—3 < Sž € 3) is about .997, so that the number of suc- 
cesses in v Bernoulli trials is almost certain to be within three stand- 
ard deviations of its mean if n is large. Unfortunately, we cannot do 
any more here than mention these results which are of such great 
practical and theoretical significance in probability. We do however 
give one illustrative example. 


Example 1.11. A coin is tossed 400 times and falls heads 210 times. 
If the coin is fair, is it unlikely to get this many heads? The number 
of heads is binomially distributed with parameters n = 400 and, we 
assume, p = 3. Hence the mean number of heads is 200 and the 
Standard deviation is V400(2)(3) = 10. The probability that the 
number of heads is between 190 and 210 is approximately .68, since 
this is a one standard deviation interval on either side of the mean. ' 
To get as many as 210 heads is therefore not at all unlikely, on the 
assumption that the coin is fair. Similarly, we find that with a fair 
Coin it is almost certain (probability .997) that the number of heads 
Will fall within three standard deviations, or 30, on either side of the 
Mean, i.e., in the range 170 to 230. Obtaining a number of heads less 
than 170 or more than 230 would therefore throw grave doubts on 
the hypothesis that the coin is fair. 
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PROBLEMS 


If the production of parts by a machine is regarded as a Bernoulli 
process with process average defective equal to p = .20, is it more 
likely to have (a) no defectives among ten parts, or (b) at most one 
defective among 20 parts? 


In a 20-question true-false examination, suppose a student tosses a id 
coin to determine his answer to each question. If the coin falls hea E 
he answers “true”; if it falls tails, he answers “false.” Find the prob- 


ability that he answers at least 12 questions correctly and thus passes 
the exam. 


The probability of having no ace in a bridge hand is approximately .30. 


What is the probability that a person who plays ten hands of bridge will 
never receive an ace? 


From the cumulative probabilities given in Table 37, determine the 
probability function of a binomially distributed random variable os 
parameters (a) n = 10 and p = .3, (b) n = 10 and p=.7,(c)n=1 
and p = .5. 


Let X be a binomially distributed random variable with mean 12 and 
variance 4.8. Find (a) P(X > 5), (b) P(5 < X < 10), (c) P(X < 10). 
A man is to throw a fair coin a 
and is to receive a 
he is to choose th 
should he choose 
prize? What then 


certain number of independent times 
prize if he throws exactly five heads. At the outset, 
e number of throws he will make. What number 
in order to maximize his chances of winning the 
are the odds for his winning the prize? 


For n = 20 Bernoulli trials, determine 

(a) P(S» > 12) for p = 0.7. 

(b) P0 < Sw < 14) for p = 0.6. 

(c) The value of p for which P(S» > 8) = 50, (Hint: Interpolate 
between two values found in the table.) 


What is the probabilit: 


y of throwing exactly nine heads exactly twice in 
five throws of ten fair 


coins? (Hint: Use the binomial distribution twice-) 
How many Bernoulli trials with 


performed in order that the 
more? 


probability .01 for success must be 
probability of at least one success be } or 


We are given the information that n Bernoulli trials resulted in exactly 


k successes. Show that the conditional probability of a success on any 
particular trial is k/n. 
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1.11. In order to decide whether to accept or reject a very large lot of items 
offered for sale, the buyer takes a sample of 20 items at random from 
the lot and tests them. If at most one defective is found, he accepts 
the entire lot; if more than one defective is discovered in the sample, 
he rejects the lot. 

(a) Find the probability that the buyer accepts the lot if in fact it 
contains a proportion of defectives equal to p, where p assumes 
the values in Table 37. 

(b) Graph the probability that the buyer accepts the lot against the 
proportion of defectives, showing the probability of acceptance on 
the vertical axis. (This is called an operating characteristic curve 
or OC curve for the single-sample decision rule adopted by the 
buyer.) 

(c) Draw the operating characteristic curve for the following alterna- 
tive single-sample decision rule: a sample of only ten items is drawn 
at random from the lot and tested. The lot is accepted if no 
defectives are found and rejected otherwise. 

(d) Where in your analysis of this problem have you used the fact that 
the lot is very large? 

1.12. (2) Prove the following recursion formula for binomial probabilities: 

bb + Un, p) = ET Cb, p). 
(b) Denote by m the unique integer for which 
(n+ l)p-l<m<(rmt+ 1)p. 
If (n + 1)p is not an integer, show that as k goes from 0 to m, 
d(k|n, p) increases up to a maximum value which occurs for k=m 
and then decreases. But if m = (n + 1)p, then show that b(k|n, p) 
increases up to b(m — 1|n, p) which is equal to b(m|n, p), and then 
decreases. 

(c) Use Table 37 to compute the binomial probabilities for n = 4, 
p = Aandn = 5, p = 4. For these special cases, check the asser- 
tions made in (b). 

(d) The number m defined in (b) is called the most probable number of 
successes in n Bernoulli trials with probability of success equal to p. 
Determine b(m|n, p) for n — 20, p — .10 and n — 20, p — .50. 
Does the most probable number of suecesses occur with high prob- 
ability? 

143. Show that 
(a) b(k[n, p) = b(n — ln, 1 — p) 


O S dln, p)=1- 2. MHm1-p. 
=r k keit 


b=n-r 
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Interpret these formulas in words and show how they are used in 
relation to Table 37. 


. Show that 


b(Eln + 1, p) = pb(k — 1ln, p) + gb(kln, p) 


and interpret this formula in words. Show how the formula can be used 
to extend Table 37 to n = 11. 


- (a) Compute P(—c < S* < c) forc = 1, 2, 3 if n = 5 and p = .20, 


and compare with the corresponding normal curve approximations. 
(b) Repeat part (a), but with n = 10 and then n = 20. 


Compute H(S,) and Var(S,) by evaluating the sums in (1.5) and (1.6). 
The function G whose value for every real number ¢ is given by 


GO = X v(kn, pj 
k=0 


is called the generating function of the binomial distribution with 
parameters n and p (or of the random variable Sn). Show that 
GU) = (q + pt)". [Note: Let those readers who know some differential 
calculus show from the definition of G that G'(1) = E(S,) and 
G"(1) + G'0) = E(S2, where G^ and G” are the first and second 
derivatives of G, respectively. By computing these derivatives from 
the explicit expression for G obtained in this problem, derive formulas 
for the mean and variance of Ss] 

Consider a finite po; 
number or 1, there being Nq 0's and Np 1’s, where p+q=1. For 
d parts that are good (0) or 


n is drawn with replacement, 
then we have seen that the number of 1’ 


a , and let Y,, be the random variable 
whose value is the number of 1’s in the sample. The probability func- 
tion of Y, will depend on n, p, and N, and we indicate this dependence 
by writing 

P(Y, = k) = h(k|n, p, N). 


Y, is said to have a hypergeometric probability distribution with pa- 
rameters n, p, and N. 


(a) Show that 
Np e: 
h(k|n, p, N) = GC í my k}, 
n 
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(Note: Recall the convention concerning binomial coefficients made in 

Formula (2.10) of Chapter 3.) 

(b) Show that the sum of the probabilities h(k|n, p, N) taken over all 
possible values of Y, is equal to 1, as required of a probability 
function. (Hint: Use Formula (2.11) of Chapter 3.) 

(c) Our notation has been chosen so that the population of N objects 
has the relative frequency table given in (1.8). If X is the sample 
mean obtained in selecting the random sample, show that Y, — nX. 
Now use Theorem 6.2 of the preceding chapter to conclude that 
the mean and variance of Y, are given by 


E(Y,) = np " 
Varra) = np (A1) 


(d) Show that as the population size N increases without bound, the 
hypergeometric distribution with parameters n, p, and N ap- 
proaches the binomial distribution with parameters n and p. In 
symbols. 

A(k|n, p, N) — b(k|n, p) as N > o. 
The importance of this limit theorem lies in the fact that when n/N 
is small enough, binomial probabilities can be used as approxima- 
tions to hypergeometric probabilities. (Hint: Write out the bino- 
mial coefficients in (a) and thus show that A(k|n, p, N) is equal to 


Now note what happens to each factor as N — co.) 

(e) Suppose a sample of size n is drawn without replacement from N 
objects of which Np are defective and Nq are good. In practice, 
one often knows N but the proportion defective p is unknown. If 
one obtains & defectives in the sample, then what is a reasonable 
estimate for this unknown proportion p? Since Np must be an 
integer, p is necessarily of the form j/N for some choice of j from 
among the integers 0, 1, ---, N. Estimating p is therefore equiva- 
lent to finding an estimate for the integer j. 

Now the probability of getting exactly k defectives depends only 
on j once k, n, and N are fixed. Let us write 


hy = h(kln, ra N) 


for this probability. The method known as mazimum-likelilood 
estimation directs us to find that value of j, say j, such that h; is as 
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large as possible. In other words, the probability of getting the 
experimental outcome actually obtained (i.e., exactly k defectives) 
is maximized if j = j. The number j = j/N is then called the 
maximum likelihood estimate of the unknown proportion defective 
p. To find f, proceed as follows: 

(i) Show that 


hi _iN-=j+1-n+h), 
Aja (J— kN —j+1) 


(ii) Show that h is greater than 1 if j < and is less 


hja 


k(N + 1), 
n 


k(N + 1) 
n 


than 1 if j > 
(iii) Conclude that if 7 is the greatest integer less than or equal to 
NED, then the maximum likelihood estimate of p is given 

by » =3/N. 
(f) Repeat the preceding problem, but now assume the sample is drawn 
with replacement, so that the binomial distribution applies. Show 


that the maximum likelihood estimate of p is given by ? — k/n, 
the actual proportion defective found in the sample. 


2. Testing a statistical hypothesis 


In this section, we illustrate how the binomial distribution is used 
in a problem of statistical inference. We cannot here go into the gen- 
eral theory of hypothesis testing in statistics. Instead, we analyze 
one particular example in detail, in order to point out the highlights 
of the method of testing hypotheses. 

The Committee for the Re-Election of Smith as Mayor is meeting 
well ahead of election day to discuss campaign strategy. Smith, as 
the incumbent, is felt to have the edge on his opponent, but the com- 
mittee wants some more information about this advantage as a guide 
to deciding whether to plan a very vigorous and expensive campaign; 
ora less vigorous and less expensive campaign. Since Smith's oppo- 
nent is going all out to win, and will undoubtedly reduce Smith's 
advantage during the campaign, the committee decides that they 
will raise funds for the more expensive campaign if 60% or less of 
the population is in favor of Smith, but that they will relax and wage 


the less expensive campaign if Smith has more than 60% of the voters 
on his side. 
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Let us denote by p the actual proportion of all voters in favor of 
Smith. If p were known, then the committee would have no problem. 
It would (according to its agreed-upon plan) decide on one course of 
action if p < .60 and on the alternative course of action if p > .60. 
But p is unknown, and some evidence will have to be obtained in 
order to choose among the two possible types of campaigns. 

It is customary in statistics to say that there are two hypotheses, 
namely p < .60 and p > .60, and the procedure by which a choice 
is made between these hypotheses is called a test of one of the hy- 
potheses against the other. The hypothesis that is tested is called the 
null hypothesis; the other is then called the alternate hypothesis. 
Although the committee will choose to accept one hypothesis or the 
other, it is customary to say instead that the committee’s choice is 
between acceptance or rejection of the null hypothesis. We shall 
shortly make some comments about which hypothesis is to be taken 
as the null hypothesis, but for now let us make the following agree- 
ment: 


Null Hypothesis: p € .60; 
Alternate Hypothesis: p > .60. 


The decision to accept or reject the null hypothesis will be based 
on the result of an experiment in which a certain number of people, 
say n, are randomly selected from the population of voters and then 
asked whether they are for or against candidate Smith. We cannot 
here discuss the very important practical problem of how to design 
a sample survey or opinion poll of this kind. But we shall assume that 
the selection of people is made in such a way that the process of sam- 
pling can reasonably be idealized as a Bernoulli process in which each 
trial (asking one of the selected people his voting intention) results 
in a success (will vote for Smith) or a failure (will not vote for Smith), 
the probability of a success on each trial being p, the proportion of 
People in the entire population who favor Smith. (Since the sample 
is ordinarily drawn without replacement, this theoretical model will 
be appropriate only if the sample size is very small compared to the 
number of voters in the entire population.) 

Let us suppose that a sample of n = 20 people is drawn at random 
from the population and that each person is asked his voting inten- 
tion. (We take so small a sample for illustrative purposes only ; larger 
samples are discussed later.) For these n = 20 Bernoulli trials, let X 
denote the number of successes (people who say they favor Smith) 
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obtained. Then X is binomially distributed with parameters n = 20 
and p, but p is unknown. Our null and alternate hypotheses are thus 
statements about a parameter of a probability distribution. Such hy- 
potheses are called statistical hypotheses. 

The committee decision to accept or reject the null hypothesis will 
be based on the outcome of the poll of the 20 people in the sample, 
in particular, on the value of X obtained. Roughly speaking, the 
committee will act on the assumption that p is low (and therefore 
accept the null hypothesis) if the value of X is small, and it will act 
on the assumption that p is high (and therefore reject the null hy- 
pothesis) if the value of X is large. But this is quite vague, and it is 
clear that what we need is a rule that unequivocally prescribes the 
committee’s decision for each possible outcome of the poll. Consider 
for the moment the following example of such a decision rule: 

Reject the null hypothesis if and only if 

at least 13 of the 20 people in the sample 

say they are in favor of candidate Smith. 
Note that the decision rule is completely described by giving the 
values of X that result in rejection of the null hypothesis. These 
values are called the critical set of values of X for the given decision 
rule. If the observed outcome falls in the critical set, the null hy- 
pothesis is rejected; otherwise the null hypothesis is accepted. 

Now the null hypothesis is, as a matter of fact, either true or false. 
And our decision rule leads either to acceptance or rejection of the 
null hypothesis. Hence the following possibilities can arise by use 
of this decision rule: 
^ (1) The null hypothesis is actually. true and the value of X does 
not fall in the critical set; ie., thénuli hypothesis is accepted. 

(2) The null hypothesis is actually true and the value of X falls 
in the critical set; i.e., thequill hypothesis is rejected. 

y9 The null hypothesis is actually false and the value of X does 
not fall in the critical set; i.e., theii ypothesis is accepted. 

(4) The null hypothesis is actuàlly false and the value of X falls 
in the critical set; i.e., the fut ypothesis is rejected. 

Now in (1) and (4) the action taken is the correct one, since ihe 
committee does indeed want to accept the null hypothesis when it 1$ 
true and reject the null hypothesis when it is false. But in (2) and (3) 
the action taken is incorrect: case (2) is said to be an error of the first 
kind or a type I error; case (3) is said to be an error of the second kind 
or a type II error. 


If the committee makes an error of the first kind, then it will with 
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a false sense of confidence wage a mild and less expensive campaign, 
even though Smith has no more than 60% of the voters on his side. 
"This error, although it saves money, may lead to the defeat of 
Smith at the polls. If the committee makes an error of the second 
kind, then it will with a false sense of urgency wage a very expensive 
campaign, even though Smith has the support of more than 60% of 
the voters. This error leads to spending money for an expensive cam- 
paign which the committee, if it knew that Smith has such support, 
would regard as unnecessary. 

Since the committee is dedicated to Smith's re-election at all costs, 
the consequences of an error of the first kind are considered much 
more serious than the consequences of an error of the second kind. 
This fact accounts for our choice of p < .60 as the null hypothesis 
and p > .60 as the alternate hypothesis, rather than vice versa. For 
it is customary to formulate the null hypothesis so that rejecting it 
when it is true (error of first kind) is more serious than accepting it 
when it is false (error of second kind). For example, in testing a new 
drug there are the two hypotheses “drug is toxic” and “drug is not 
toxic.” The former would be taken as the null hypothesis, since re- 
jeeting it when it is true will lead to deaths of patients, whereas 
accepting this hypothesis when it is false will have the less undesirable 
consequences of loss of money by the manufacturer and unnecessary 
waste of the drug. Of course, in cases where the two kinds of errors 
are of the same importance, it is immaterial which of the two hy- 
Potheses is called the null hypothesis. 

To study the decision rule stated above (for which n = 20 and the 
null hypothesis is rejected if and only if the value of X is at least 13) 
it is convenient to define the function m whose value for each possible 
value of the parameter p is the probability of rejecting the null hy- 
pothesis. That is, 


20 
rp) = P(X 2 13) = 2, b(k|20, p). 


The function x is called the power function of the given decision rule. 
The reader should check the following values of this power function 

Y referring to Table 37. (We write 0+ for a positive number less 
than .0005, and 1— for a number greater than .9995 but less than 1.) 


0+ | 0+ | .001 | .021 
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20 
7(pj- Z b(k|20, p) 
4 k=13 


1.0 


0.9 
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7 Null hypothesis true— — Null hypothesis false—— 


Figure 28 


The graph of this power function is shown in Figure 28 where we 
also indieate the graph (consisting of two horizontal line segments) 
of the power function for an ideal decision rule defined as a rule fet 
which the probabilities of errors of both the first and second a 
are zero. Since we are plotting the probability of rejecting the nul 
hypothesis, we find in Figure 98 that this ideal power function has 


D > .60, the difference in heights of the 
of an error of the second kind. 

For the decision rule whose power function is graphed in F igure 28, 
we observe that as p increases from D = 0to p = 6, the probability 
of an error of the first, kind increases from 0 to .416. Similarly, as We 
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move to the left from p — 1, the probability of an error of the second 
kind inereases from 0 to .584 as we approach the borderline value 
p= 6. 

The committee is not very happy with this decision rule, for it in- 
volves high error probabilities. For example, even if candidate Smith 
is favored by only 50% of the voting population, the probability is 
-132 that there will be at least 13 people in favor of Smith among the 
20 people in the sample, thus leading the committee to plan a weak 
campaign when a strong one is clearly required. And if the percentage 
favoring candidate Smith is less than but near 60%, the committee 
is appalled to find that the decision rule will lead to a wrong decision 
roughly 40% of the time it is used. And even if candidate Smith is 
comfortably in the lead with, let us say, 70% of the voters on his 
side, the sample of 20 will contain less than 13 in favor of Smith with 
Probability .228 and the decision rule will then lead the committee 
to erroneously wage a strong expensive campaign. 

The committee therefore asks whether it is possible to formulate 
a decision rule for which errors of the first and second kind are both 
smaller than for the rule already mentioned. Let us see what happens 
if we keep the sample size fixed at n = 20. Then the only sensible 
rules that the committee will consider will be of the form: 


Reject the null hypothesis (that p < .60) 
if and only if X, the number in favor of 
Smith among the 20 people in the sample, 
is at least some specified number, say c. 


Each choice of the number c determines one decision rule. We have 
already discussed the rule with c = 13. In order to com pare the vari- 
9US possible rules, one determines the power function of each decision 
Tule by using the definition of z(p) as the probability of rejecting the 
null hypothesis when the parameter value is p. That is, 


a(p = P(X > ¢) = Ei b(k|20, p). 


We have used our table of binomial probabilities to compute z(p) 
fore = 15, 16, 17. These values are given in Table 40 and graphs of 
the Corresponding power functions are drawn in Figure 29. 

Tom either the table or the graphs we see that as c increases, the 
Probability of an error of the first kind decreases for all p satisfying 
PS .60, i.e., the graphs move down toward the ideal graph for which 
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TABLE 40 


0} .10 |.20 | .30 |.40 | .50 | .60 |.70 |.80 |.90 | 1 


0 | 0+} 0+] 0+ | .002 | .021 | .126 | .416 | .804 | .989 | 1 


0| 04-1 O+ | 04- | 0+ | .006 | .051 | .238 | .630 | .957 | 1 


.107 | .411 | .867 | 1 


the error probability is zero. But, at the same time the probability 
of an error of the second kind increases; i.e., the graphs move down 
away from the ideal graph for all p satisfying p > .60. 

The committee thus learns that with sample size 20 it cannot simul- 
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T(p- È b(k|20, p) 
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Figure 29 
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taneously lower both the probability of making an error of the first 
kind and the probability of making an error of the second kind. The 
customary statistical procedure in this cireumstance is to concentrate 
on the errors of the first kind since, as we have noted earlier, they are 
presumably more serious than those of the second kind. The com- 
mittee chooses » number, ordinarily denoted by c; which is the maz- 
imum probability of an error of the first kind that it will tolerate. 
In actual practice the number o is often chosen as one of the numbers 
-01, .05, or .10. Having picked e, the particular decision rule is 
adopted which not only meets the requirement that the maximum 
probability of an error of the first kind does not exceed o, but which 
in addition yields the lowest possible probabilities of errors of the 
Second kind. 

For example, suppose the committee chooses a = .02. Then in 
Figure 29 we seek that value of c for which the height of the power 
curve for p — .60 (which gives the maximum probability of an error 
of the first kind) does not exceed a — .02, but is as close to .02 as 
Possible. We find from the figure or from the values in Table 40, 
that c — 17 has the required properties. Thus the committee's choice 
of æ = .0 dictates the use of the decision rule for which c = 17; i.e., 
the committee determines the value of X, the number of voters for 
Candidate Smith in the sample of size 20, and rejects the null hy- 
pothesis if and only if X > 17. Although the committee now has a 
very high probability of making an error of the second kind (wasting 
money on an unnecessary strong campaign) it does have assurance 
that there is only at most a 2% chance for an error of the first kind 
(not waging a vigorous campaign when Smith needs it to win). 

Now suppose that the committee is not willing to risk such large 
chances of an error of the second kind. From Table 40, for example, 
We find that P(X > 17) = .107 when p = .70. Thus there is almost 
290% chance of accepting the null hypothesis (and therefore wasting 
d On an unnecessary strong campaign), even when Smith has 

76 of the voters on his side. What can be done to maintain the 
Maximum risk level given by œ = .02 but at the same time to lower 
the risks of errors of the second kind? 
dab ie sample size fixed at n = 20, there is nothing that can be 
bath ut if larger samples are permitted, then risks of errors of 

first and second kind can be controlled. We illustrate this point 
Y considering samples of size n = 50, n = 100, and n = 300. 
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Our decision procedure is stated in general terms as follows: 


A sample of n people is drawn from the population of all voters. 
Let X be the random variable whose value is the number (among 
the a selected people) who are in favor of Smith. Reject the null 
hypothesis (that p < .60) if and only if X > c, where cis determined 
so that the maximum probability of an error of the first kind docs 
not exceed some prescribed value œ (we suppose the committee 


chooses œ = .02) and so that probabilities of errors of the second 
kind are as small as possible. 


From our previous discussion, the reader can see that to determine 
€ we proceed as follows: First put p = .60, since it is for this value 
that the probability of an error of the first kind is largest. Any num- 
ber c for which P(X > c) does not exceed a, i.e., for which 


(2.1) Z b(k|n, .60) < a, 


will determine a decision rule whose maximum probability of an error 
of the first kind is at most a. To also minimize the probability of 
making an error of the second kind, we select the smallest value of c 


satisfying the inequality in (2.1). Put differently, we choose c as the 
smallest number in the set 


(2.2) {x | E b(k|n, .60) < a}. 
(We are assuming that æ is chosen so th 
in (2.2) therefore contains the number n 
set, containing the values of 
jected, is called the critical se 


at b(n[n, .60) < a. The set 
and so is not empty.) This 
X for which the null hypothesis is re- 
t of values of X for the given decision 


rule. 

The reader should refer to Table 37 and check that with « = -02 
and n = 20, the value of c determined in this way is c = 17, as We 
have alre 


ady seen in Table 40 and Figure 29. Using more extensive 
tables of cumulative binomial probabilities, we similarly find that 
with æ = .02, the smallest value of c satisfying (2.1) is c = 38 for 
n = 50, c = 71 for n = 100, and c = 198 for n = 300. We therefore 
have four decision rules, all determined by the committee's setting 
of a = .02 as the maximum tolerable probability of an error of the 
first kind. Values of the power function of these four rules are give? 
in Table 41. For comparison with Figure 29, we graph the three 
power functions for sample sizes n = 50, 100, and 300 in Figure 30. 
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TABLE 41 
p .50 
n=20, c=17 001 
n=650, c=38 04- 
(p) 
n —100, c— 71 04- 
n = 300, c = 198 04- 


As expected, the risk of making an error of the second kind goes 
down as the sample size increases; i.e., for each p > .60, as n increases 
the curves move up toward the ideal graph for which the probability 
of an error of the second kind is zero. From Table 41 we read that 


m(p)- È b(k|n, p) 
kee 
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with n = 300, the probability of finding at least 198 persons in favor 
of Smith when p — .70 is .941, so that the probability of an error of 
the second kind is reduced to .059. Thus we have demonstrated how 
the committee can maintain its maximum tolerable probability of an 
error of the first kind at æ = .02 and can also control the risks of 
errors of the second kind by sampling a sufficiently large number of 
people from among the entire population of voters. 

For the remainder of our discussion, in order that we may use 
Table 37, we return to the simple decision rule with n = 20, c = 17. 
Since c is chosen as the smallest number in the set defined in (2.2), 
we know that 


20 
(2.3) P(X >2)= = b(k|20, .60) < .02 for all z > 17, 
and 


20 
(24) P(X>2) = ,Z,W20,.60) > .02 for all z < 17. 


We are now able to see that although it is helpful in understanding 
the method of testing hypotheses to determine decision rules and 
power functions, it is in practice unnecessary to do so if all one wants 
to do is decide whether the experimental evidence leads to acceptance 
or rejection of the null hypothesis. 

The result of the poll of 20 voters is the occurrence of the event 
X = c, where z is an integer from 0 (none in favor of Smith) to 20 
(all in favor of Smith). The larger the value of X , the more unfavor- 
able is the result of the poll to the null hypothesis that p < .60. The 
number P(X > z), calculated for the borderline value p = .60, is the 
probability of getting a value of X at least as unfavorable to the null 
hypothesis as the one actually observed and is called the statistical sig- 
p wed or the descriptive level of significance of the observed event 

= 2. 

According to (2.3) and (2.4), if the descriptive level of significance 
of X = z is less than or equal to a = .02, then the null hypothesis 
1s rejected, since z must then be greater than or equal to 17; if the 
descriptive level of Significance of X = x is greater than a = 02, 
then the null hypothesis is accepted, since x is then less than 17. A 
value of X that leads to rejection of the null hypothesis is said to be 
Significant at the level a = -02 (or at the 2% level of significance) ; a 
value of X that leads to acceptance of the null hypothesis is not sig- 
nificant at the level w = .09. Testing the significance of the observed 


Sec. 9 / TESTING A STATISTICAL HYPOTHESIS 283 


value of X at the level a (i.e., computing P(X > x) for p = .60 and 
comparing it with a) is therefore a way of determining the action to 
be taken without first finding the decision rule and its power function. 
For this reason, tests of statistical hypotheses are often called tests 
of significance. 

To illustrate these ideas, suppose the committee has decided to 
sample n = 20 people and has set « = .02 as the maximum tolerable 
probability of an error of the first kind. The 20 people are polled and 
the event X — 16 occurs; i.e., 16 people are in favor of Smith. With 
P = .60, we find from Table 37 that 


P(X > 16) = .051. 


Since .051 > .02, the observed event X = 16 is not significant at 
the 2% level of significance and the committee therefore accepts the 
null hypothesis. Note that if the committee had set a = .06, say, 
then this same value of X would be significant at the 6% level of 
Significance (since .051 < .06) and would therefore lead to rejection 
of the null hypothesis. By increasing a, the committee increases its 
chances of rejecting the null hypothesis. It also, of course, increases 
the chances of making an error of the first kind.* 

We conclude by reminding the reader that we have discussed only 
One particular problem and that null hypotheses, decision rules, and 
tests of significance will generally assume different forms in different 
Problems. Nevertheless, an understanding of this section should en- 
able the reader to solve a variety of problems of the sort treated here 
where the binomial distribution applies. This hypothesis can be 
tested by trying the problems that follow. ; 


PROBLEMS 


21. Consider the decision problem discussed in the text, and suppose the 
committee chooses to base its decision on a sample of n = 20 people. 
As in the text, let œ denote the maximum tolerable probability of an 
error of the first kind. 


* The obvious fact that errors can be made when using tests of significance 
Means that these tests are fraught with danger and must be interpreted with 
ro caution. For a particularly impressive discussion of this point, with special 
manne to the field of psychology, see T. D. Sterling, “Publication Decisions 

id Their Possible Effects on Inferences Drawn From Tests of Significance—or 


i " : V Ud Y ys T 
eig Journal of the American Statistical Association, vol. 54 (1959), pp. 
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(a) What decision rule is determined if œ = 0? What then is the proba- 
bility of an error of the second kind for all p > .60? Draw the graph 
of the power function for this rule. 

(b) What decision rule is determined if the committee insists that the 
probability of an error of the second kind must be zero for all 
p > .60? What then is the value of a? Draw the graph of the power 
function of this decision rule. 

(c) What decision rule is determined if œ = .10? For this decision rule, 
what is the probability of an error of the second kind if p = 70? 
if p = .80? What is the probability of an error of the first kind 
if p = .50? 

(d) The committee has decided to use œ = .10 as in part (c) and finds 
that 75% of the people in the sample are in favor of Smith. What 
is the descriptive level of significance of this observed event? Does 
the committee wage a very expensive or a less expensive campaign? 

(e) The committee decides to use a = .01. How many people in the 
sample of 20 must be in favor of Smith before the observed event is 
significant at the level .01? 

2.2. Suppose the committee decides on a sample of n = 10 people and sets 
a = .05. Determine the decision rule that should be used and draw the 
graph of its power function. 

2.3, 


In a study of the effects of stress,* 20 college students were taught to 
tie a bowline knot by two different methods, Half the subjects learned 
method A first and the other half learned method B first. Later—after 
an active day and an evening final examination—each subject was asked 
to tie the knot. The prediction was that stress would induce regression, 
ie. the subjects would tend to revert to the first-learned method of 
tying the knot. Each subject was classified as a success (used knot-tying 
method he learned first) or a failure (used method he learned last). AS 
sume that the experiment can be idealized as a set of 20 Bernoulli trials 
with (unknown) probability p for success on each trial. 

Suppose the null hypothesis expresses the fact that there is no regres- 
sion and that under stress it is equally likely to use either of the two 
methods of tying the knot. (It is eases of this kind that explain the use 
of the word “null”: the null hypothesis asserts that stress has no effect-) 
The alternate hypothesis will then state that regression does occur; ies 


it is more probable that under stress the first-learned method is used 
than the second-learned method. 


(a) Formulate these hypotheses in terms of p and determine the decision 


* Barthol, R. P. and N. D. Ku, “Regression Under Stress to First Learned Be- 
havior,” Journal of Abnormal and Social Psychology, vol. 59 (1959). pp. 131-36- 
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2.4, 


2.5. 


2.6. 


rule if & — .05; i.e., if an error of the first kind has probability at 
most .05. 

(b) Suppose 15 of the 20 subjects use the first-learned method of tying 
the knot. What is the descriptive level of significance of this ob- 
served outcome? Is it significant at the 5% level? Is it significant 
at the 1% level? 


Determine a decision rule to test the null hypothesis p — .20 against the 
alternate hypothesis p = .60, assuming a sample of size n = 10 and a 
maximum tolerable probability for an error of the first kind equal to .05. 
What is the actual probability of an error of the first kind for your test? 
What is the probability that you will incorrectly accept the null hy- 
pothesis when p — .60? 


The production manager of a company submits a report recommending 
hiring of additional repairmen. His conclusions are based on the assump- 
tion that, on the average, 20% of the machines in the shop will require 
maintenance on any given day; i.e., the probability is .20 that a machine 
observed for a period of a day (a machine-day) will need the services of 
a repairman. The president of the company is interested in testing this 
assumption, since the conclusions of the report will be different if the 
assumed 20% is either too high or too low. Suppose (unrealistically, but 
in order to be able to use Table 37) that only 20 machine-days are 
observed and the president is willing to take at most a 10% risk of 
rejecting the assumption if it is true (æ = .10). 


(a) Formulate a null and alternate hypothesis, explaining how the 
binomial distribution applies (i.e., define trial, success, failure, etc.) 

(b) Determine a reasonable decision rule for testing the null hypothesis. 

(c) Of the 20 machine-days observed, seven required services of a re- 
pairman. What is the descriptive level of significance of this event? 
Is it significant at the .10 level? 

(d) Draw the graph of the power function of the decision rule in (b) and 
on the same figure also draw the corresponding graph for an ideal 
decision rule for which the probabilities of both kinds of errors are 
Zero. 


A hair tonic manufacturer claims that his product will cure baldness at 
least 70% of the time it is used according to instructions. Formulate 
null and alternate hypotheses to test this claim. Determine a decision 
rule assuming the maximum tolerable probability of an error of the first 
kind is æ = .05. Use a sample of size n = 20. 
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3. An example of decision-making under uncertainty 


Analyses of the type discussed in the preceding section can be 
carried still further and made more realistic if we assign relative 
values to the losses that will arise when various kinds of errors are 
made. We should also take note of the fact that sampling involves 
certain expenses and that in some practical situations larger samples 
may cost more to obtain than the consequent reduction in proba- 
bilities of errors is worth. In short, statistical investigations are 
undertaken as a basis for action, and decisions should therefore be 
made in the light of all their relevant consequences. 

We shall illustrate this approach by discussing a particularly 
simple problem that can be solved with the mathematical skills we 
have now accumulated.* : 

Before each production run, a machine used to produce a certain 
part must be adjusted by an operator. Five hundred parts are pro- 
duced in each such run, and each part is classified as either good or 
defective. On the basis of his experience with the machine, the 
manufacturer is willing to assume that the production of the 500 
parts can be thought of as a Bernoulli process in which the proba- 
bility of a defective, denoted by p and called the average fraction 
defective, is the same on each of the 500 trials. 

Two delicate adjustments must both be made perfectly by the 
operator before each run in order to have p = .01, which is the very 
best that the machine can do, because of mechanical limitations. 
But if only one of these adjustments is properly made, then the aver- 
age fraction defective becomes p = -10, and if the operator happens to 
make neither adjustment properly then p = .20. We therefore have 
three possible "states" for the machine. On the basis of records of 
past production runs made by the operator, the manufacturer esti- 
mates that the operator will have both adjustments right 80% of 
the time, miss exactly one adjustment 15% of the time, and miss 


* This problem, with changes in numerical values, is one treated in great detail 
by somewhat different methods in R. Schlaifer, Probability and Statistics for 
Business Decisions, McGraw-Hill Book Company, Inc., 1959, especially Chapters 
22 and 33. I here express my appreciation to Professor Schlaifer for permission 
to use this material. I am also indebted to Professor Howard Raiffa for introduc- 


ing me to this kind of decision problem and for the partieular method of solution 
used in the text. 
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both adjustments 5% of the time. These data are summarized in 
Table 42. 


TABLE 42 
Probability of a Defective | Probability | 
State of Machine Part Given This State of This State 


I (Both adjustments right) 01 
II (Only one adjustment right) 10 


III (Neither adjustment right) 20 


Each of the 500 parts produced by the machine, whether good or 
defective, is used in assembling the final product, but a defective part 
requires special hand fitting which costs $5.00 per part. This means 
that a faulty machine setup (i.e., machine in states II or III) can 
lead to a fairly high cost of using defective parts. 

However, the manufacturer can reduce these costs by calling in a 
master mechanic before the production run. If this is done, the ma- 
chine is certain to be properly adjusted and therefore will be in state I, 
where the average fraction defective is equal to its minimum value 

= 01. This special use of the master mechanic, owever, costs $50. 
Thus, if the regular operator has put the machi.. in state I (as he 
does most often), then this $50 would be a total loss. On the other 
hand, if the operator has missed one or both of the adjustments, then 
the saving in cost of using defective parts more than offsets the extra 
cost of hiring the master mechanic. 

Finally, the manufacturer considers that he may be able to reduce 
his average costs by inspecting a sample of the product after the 
Operator prepares the machine, but before beginning the actual pro- 
duction run. He might, for example, make a sample run of ten parts 
and then inspect each part. When the number of defectives among 
the ten parts in the sample is “high,” he would call the master me- 
chanic; when it is “low,” he would order the regular run of 500 parts 
to be made without readjustment. But sample production and in- 
Spection cost $2.00 per part. 

There are two possible decisions that the manufacturer can make: 


1. Order the production run to proceed after the machine is pre- 


pared by the regular operator without calling the master mechanic; 
We shall call this a decision to proceed. 
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2. Call the master mechanic and have him readjust the machine 
so that its average fraction defective is certain to be p = .01; we 
shall call this a decision to readjust. 

The manufacturer, whose aim is to make the average cost of the 
entire production run as low as possible, asks the following questions: 


1. Should the decision to proceed or readjust be made without a 
sample run or on the basis of evidence accumulated (at a price due to 
inspection costs) in a sample run? 

2. If a sample run is not indicated, then which of the two decisions 
should be made? 

3. If a sample run is indicated, then how large a sample should be 
taken; i.e., how many parts should be made by the machine? (We 
assume that all of these parts will be inspected.) And what decision 
rule should then be adopted; i.e., for each possible outcome (as 
measured by the number of defective parts discovered in the sample), 
which of the two decisions should be made? 


By answering these questions, we give the manufacturer a rule for 
action in the face of uncertainty (since the actual state of the ma- 
chine is unknown). Moreover, this rule will be optimum in that it 
minimizes the average cost of the entire production run. ! 

We first investigate the costs involved in each decision, assuming 
no sample run is made. If the decision is to proceed and we suppose 
of the machine is given (i.e, p 18 
known), then the mean number of defectives produced in the run 1S 
500p, and so the mean cost of defectives is 500p X $5.00 = 2500p 
dollars. We compute this mean cost for each possible state of the 
Table 43. Similarly, if the decision 
, then the master mechanic adjusts the machine (makes 
of defectives produced in the run is 


TABLE 43 


Mean Cost Loss Due to Loss Due to 
of Defectives Proceeding Readjusting 
Probability Given This Rather Than | Rather Than 
of a Defective State When Taking Better Taking Better "fa 
State Part Given Decision Is to Decision if Decision if Probability 


of This State |—— — — State Were 


State Were of This 


Machine Proceed | Readjust Known Known State 


.80 
45 
.05 
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500p = 5 parts. Hence, no matter what state the machine was put 
in by the regular operator, the mean cost of defectives when the de- 
cision is to readjust is 5 X $5.00 = $25.00, plus the $50.00 cost of the 
master mechanic, or a total of $75.00. This cost is listed in the fourth 
column of Table 43. 

Note that if the machine is in state I, then the better decision (i.e., 
the one with the lower mean cost) is the decision to proceed. But if 
the machine is in state II or III, then the better decision is to re- 
adjust. The asterisks in Table 43 indicate the mean costs of the 
better decision if the state of the machine were known; i.e., if perfect 
information concerning the quality of the adjustments made by the 
regular operator were available to the manufacturer. But such per- 
fect information is not available. Thus, if the machine is known to be 
in state I, then the better decision is to proceed and this action in- 
volves a mean cost of $25. Since this is the better decision for state I, 
the loss due to proceeding in the absence of perfect information hap- 
pens to be zero; but the loss due to readjusting rather than taking the 
better decision is the mean cost of readjusting ($75) minus the mean 
cost of the better decision ($25), and hence is $50. Similarly, if the 
machine is in state II, then the better decision is to readjust. Hence, 
the loss due to proceeding rather than taking this better decision is 
the mean eost of proceeding ($250) minus the mean cost of the better 
decision ($75) or $175. In this way, we compute the losses given in 
the fifth and sixth columns of Table 43. 

The loss due to proceeding is therefore $0, $175, or $425 with prob- 
ability .80, .15, and .05 respectively, these being the given proba- 
bilities of the three states of the machine. Hence we find: 


Mean loss due to proceeding = 0(.80) + 175(.15) + 425(.05). 
= $47.50. 
Similarly, we find: 
Mean loss due to readjusting = 50(.80) + 0(.15) + 0(.05) 
= $40.00. 


We conclude that if no sample run is ordered, then the mean loss 
due to the decision to proceed is $7.50 more than the mean loss due 
to the decision to readjust. The manufacturer should therefore de- 
Cide to readjust; i.e., the master mechanic should be called in before 
the production run, thus making average costs $7.50 less per run than 
With the alternate decision to proceed. 

Although the decision to readjust is the better decision, we have 
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computed that the mean cost due to readjusting is $40.00. Tence 
$40.00 is the mean cost of uncertainty in this problem, and the manu- 
facturer could therefore afford to pay any price up to $40.00 for the 
certain knowledge (never available in practice) of the value of p. In 
other words, his costs would average $40.00 less per run if he had 
information that allowed him to make the better decision for each 
state, i.e., if he knew the true value of p and took the decision to 
readjust only when p — .10 or p — .20 (when it paid to readjust). 
The state of uncertainty is somewhat reduced by evidence accu- 
mulated in a sample run. But such evidence costs $2.00 for each part 
produced by the machine and inspected. We turn now to the prob- 
lem of determining whether the mean loss of $40.00 just computed 
can be lowered by ordering a sample run before making a decision 
to proceed or readjust. 
'The decision rules we allow are of the following form: 
Order a sample run of n parts. Let X be the random variable whose 
value is the number of defectives among the n parts in the sample. 
Make the decision to readjust if X > c, where c is some specified 
number. Make the decision to proceed if X < c. 


Each choice of n and c determines one such rule, which we call the 
(n, c) decision rule. For example, the (5, 1) decision rule requires that 
a sample of n = 5 parts be produced; the decision to readjust is then 
made if and only if the number of defectives turns out to be at least 1. 

We now demonstrate how to compute the mean loss from the use 
of such a decision procedure. We shall for the moment concentrate 
on explaining the construction of Table 44 which concerns the (5, 1) 


decision rule. A similar analysis applies to any (n, c) rule, no matler 
what the values of n and c. 


TABLE 44 
Q) (2) @) [2] © ©) (e) (8) 


pur wd Probability | Probability Mean Mean 

ofa of Decision | of Decision | Prob. Los: 

B : - s Total 

pee pene to Readjust to Proceed ability Loss Due Due to Loss Due 

sh a- e Given Using Using of Wrong | to Wrong | Wrong | to Wrong 
ine is State | (5, 1) Rule | (5, 1) Rule | Decision | Decision | Decision | Decision 


i 01 -049 -951 -049 $ 50 $ 2.45 $ 12.45 
i 0 .410 -590 -590 175 103.25 113.25 
-20 -672 -328 328 425 139.40 149.40 


Mean loss due to use of (5, 1) decision rule = 12.45(.80) + 113.25(.15) + 149.40(.05) 
= $34.42, 
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Columns (1) and (2) of Table 44 are clear. Column (3) is obtained 
directly from the cumulative binomial probabilities in Table 37. 
Under the (5, 1) rule, the decision to readjust is made when X, the 
number of successes (defectives) in the n = 5 Bernoulli trials making 
up the sample run, is at least 1. If p = .01, we find P(X > 1) = .049: 
Similarly, we read directly from the binomial tables that P(X > 1) 
equals .410 if p = .10 and equals .672 if p = .20. Thus column (3) 
is completed. The probability that the (5, 1) rule will lead to a de- 
cision to proceed is 1 minus the probability that it will lead to a 
decision to readjust. Hence, the entries in column (4) in Table 44 are 
obtained directly from those in column (3). 

The probability of a wrong decision, entered in column (5), is 
merely the probability that the decision rule leads to readjustment 
if p = .01 (when the better decision is known to be to proceed) and 
to proceeding if p = .10 or p = .20 (when the better decision is 
known to be to readjust). 

The loss due to a wrong decision has been computed in Table 43, 
and so column (6) of Table 44 is easily completed. 

The entries in column (5) are multiplied by the entries in column 
(6) to give the mean loss due to a wrong decision; i.e., this mean loss 
(for given p) is the product of the loss and the probability with which 
it is sustained. 

Finally, to the mean loss entered in column (7) we add the cost 
of the sample, which is $2 for each of the five parts sampled. The 
entries in column (8) are therefore merely $10 more than those in 
column (7). 

Since we are given (in Table 42) the probabilities of the three possi- 
ble states, we can compute the overall mean loss due to the use of the 
(5, 1) decision rule. This we do in the lower part of Table 44. Since 
this mean loss is $34.42 and is therefore less than the mean loss of 
$40.00 due to a decision to readjust without a sample run, we see 
immediately that it does pay to order a sample run. The only remain- 
ing question is therefore the choice of the best possible decision rule. 

To find the best decision rule, we must find the best pair of values 
of n, the sample size, and c, the smallest number of defectives leading 
to a decision to readjust. (Of course, "best" is interpreted as lowest 
mean loss for the production run of 500 parts.) We proceed by first 
finding the best value of c, given the sample size n; we then compare 
different values of n when each is used with the value of c that is 
best for it. From this point on, we only state results. The reader 
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can verify each of our statements by carrying out computations 
similar to those used in constructing Table 44. (We omit problems 
at the end of this section, since there is ample opportunity to test 
one's understanding by checking our results.) 

By keeping the sample size fixed at n — 5 and varying c, we find 
the following mean losses due to use of (5, c) decision rules: 


Decision Rule Mean Loss 
(5, 1) $34.42 
(5, 2) 49.82 
(5,3) 56.03 


It is clear that for samples of size n = 5, the best value of c is 
€ = 1. In fact, similar computations show that for each of the sample 
sizes n = 4, 5, 6, 7, 8, 9 the best value of c isc = 1. (This is a pecu- 
liarity of our particular problem and is not generally true.) We thus 
obtain the following mean losses for decision rules with various sam- 
ple sizes, each computed with the value c — 1 that is best for it. 


Decision Rule Mean Loss 
(4, 1) $35.49 
(5, 1) 34.42 
(6, 1) 33.87 
(7, 1) 33.73 
(8, 1) 33.94 
(9, 1) 34.45 


We note that rule (7, 1) has the lowest mean loss. It is therefore 
the decision rule preferred by the manufacturer. He orders a run of 
n = 7 sample parts. If at least one of these seven parts is defective, 
he spends the $50 required to have the master mechanic readjust the 
machine. If no defectives are found among the seven parts, then he 
orders the run to proceed Without readjustment. His mean cost of 
uncertainty is thereby reduced to $33.73 from the $40.00 cost result- 


ing from his best decision (namely, to always call the master me- 
chanic) in the absence of a sample run. 


SUPPLEMENTARY READING 


The binomial distribution is discussed to a greater or lesser extent in 


the probability and statistics books included in reading lists at the end of 
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preceding chapters. Some idea of statistical applications will be obtained 
by also consulting the following books, in addition to the references men- 
tioned in footnotes. 


1; 


Ackoff, R. L., The Design of Social Research, University of 
Chicago Press, 1953. 


2. Bross, I. D. J., Design for Decision, The Macmillan Company, 1953. 
3. Cowden, D. J., Statistical Methods in Quality Control, Prentice-Hall, 


Inc., 1957. 


Dodge, H. F. and H. G. Romig, Sampling Inspection Tables, 2nd 
edition, John Wiley and Sons, Ine., 1959. 


. Mosteller, F., “Applications,” pp. xxxiv-lxi, in Tables of the Cumula- 


tive Binomial Probability Distribution, Annals of the Computation 
Laboratory of Harvard University, vol. XXXV, Harvard University 
Press, 1955. 

Sprowls, R. C., Elementary Statistics for Students of Social Science 
and Business, McGraw-Hill Book Company, Inc., 1955. 

Wallis, W. A., and H. V. Roberts, Statistics, A New Approach, The 
Free Press, 1956. 


Note. Now that you are at the end of this book, you can review some 


of the things you have learned and prepare the way for continued 
study of probability by reading the following articles. 

Curtiss, J. H., “Elements of a Mathematical Theory of Prob- 
ability,” Mathematics Magazine, vol. 26 (1953), 233-254. 
Halmos, P. R., “The Foundations of Probability,” American 
Mathematical Monthly, vol. 51 (1944), 493-510. 


Robbins, H., “The Theory of Probability," Chap. XI in Insights 
Into Modern Mathematics, Twenty-third Yearbook, National 
Council of Teachers of Mathematics, Inc., 1957. 
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ANSWERS TO ODD-NUMBERED PROBLEMS 


Chapter 1 
(a) Finite, one element; (c) Infinite; (e) Finite, four elements, 
(12224232, 1523542, 254532221, 
2342-1); (g) Finite, two elements, (2, lys 
Consider numbers of the form n? + z(n — 1)(n — 2)(n — 3), and find 
x such that this number is 94 when n = 4. z = 13. 
(a) {(2, 3)}, the point of intersection of the two lines; 
(b) 9, for the two lines, being parallel, have no points in common; 
(c) {(x, y)|z + y = 5}, the set of all points on the graph of the equa- 
tion x + y = 5, for the two equations define the same line. 
(a) A=B; (QA- B; (QA- (5,2) =B = (052. 
(a) The same number appears on each die; (c) The sum of the num- 
bers is 4. 
B. 
(a) Correct; (b) Incorrect, for the only subsets of {{1}} are Ø and 
{{1}}; (e) Correct, for the elements of (1, (1) are 1 and {l}; 
(d) Correct. 
(a) {(0, 2), (0, C2); (c) Ø; (e) Upper semicircle, including the 
points (—2, 0) and (2, 0). 
5-4 = 20. 
8-8-9- 104 = 5,760,000. 
(a) 6; (b) 9; (ce) 3; (d)6. 
Assuming the coins distinguishable: 2°, 24, 2" ways. 
(a) 169; (b) 338; (c) 169. 
(a) 2, 33, n*; (b) X, 3", m. 
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3.1. A’ = {b,c}, B' = {a,c}, AU B = (a,b), A(1B - f, 
A'(YB'—2 (dj, A'M (AUB) = (bj. 

3.3. (a) U = (HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}, where 
an element represents the outcome for the penny first, the nickel, and 
then the dime; 

(b) A’ = {THH, THT, TTH, TTT}, 

AUB = (HHH, HHT, HTT, HTH, TTT}, 
AC = {HHH, HHT, HTH}, 

ANC = {THH}, (1B) MC = {HHH}. 

3.5. (a) (i) 54, (iii) 3; (b) (1) Y OC, (3) (YU N)' or Y' (1 N*. 

3.7. (a) n(U) = n(A) + n(A’); (b) Let B= A’ and note that then 
n(A N B) becomes n() = 0. 

3.9. 4. 

311. ØS (ANB)ANCECBNA=ANBCBCAUB 
C(AUB)UC-AU(UOC CG U= f. 
313. (a) PU B= Bor P(1B- Por PAB =9; 
(c) (MAC) OW =f; 
() PABAM) #6; (g) (PY B) C Q' AW) #9; 
GQ BUI-IoeB(OI-BorB(Y'-9; (k) B=I. 
4.1. (1a) 


(9a) 


BNC\AU(BNC)|AUB|AUC| (AU B) (AUC) 


€ 
€ 
€ 
€ 
€ 
Li 
Li 
€ 


manam OR OR o 
mm mm mmm om 
am mm mmm n^ 


J 


Referring to Figure 10 of the text, we see that both sides are repre- 
sented by Rı & R: & Ra & R; & Rs. 
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4.3. 


(a) 
ul mmm 4 
A B A’ B A'n B' (A' n BY’ AUB | 
€ € € Li € € € 
€ € € € € € € 
[i € € € Li € € 
€ € € € € ri g 


AL BL CL BAC] (AN (BNO) | (AN (BNO)'| AUB |AUBUC’ 


4.5. 


4.7. 


4.9. 


411. 
5.1. 


5.3. 


a ww 
nananana 
———A 
^» ha^ 
n 


(a) In Figure 9 of the text, both sides are represented by Ri & R: & Rs; 
(b) In Figure 9 of the text, both sides are represented by R&R & Ry; 
(e) In Figure 10 of the text, both sides are represented by R&R & 
R: & R; & Rs & R; & Rs. 
(a) If (1) C U B = B and (2) BU W = W, then C U W = W. 

Proof: CUW 2 CU (BU W) = CUB) UW - BUW - W, 
by using (2), law 8a, (1) and (2), respectively. 
(A (Y B) NA (C ND) = ECL Y B) (Y C] (d D, by law 8b, 

= [4 N (B (10)] A D, by law 8b. 

(a) u, 0; (b) Ø, Uu; (c) A, A^, U, Ø; (d) same as (c). 
(a) (1,1), (0,2, (2,1), (2,2); (9 (1, 2), (0,3, (2, 2), (2,3; 
(e) (2,3; (g) (1, 2), (1, 3), (2, 2), (2, 3). 
(a) If A = B, then A X B= A X A = B X A. The converse is false, 
but if A x B ¥ Ø, i.e., neither A nor B is the null set, then the con- 
verse is true; 
(c) AXBCCXDif A C C and BCD. Proof: We consider two 
cases: (1) If A X B = Ø, then clearly A X B-0C C X D. (2) If 
A X B = fj, then let (a, b) be any element of .1 X B. Since A C C 
and B C D, we have a e C and b e D. Thus (a. b) eC X D. We con- 


298 ANSWERS: CHAPTER 2 


clude that A X B C C X D. The converse is false, but if either A = Ø 
and B = Ø, or A ¥ Ø and B = Ø, then the converse is true. 
5.5. (A X U) N (U X B) = (A X40) N (4 U A) x B) 
= (A XU) N ((A X B) U (A' x B)), by Problem 5.4, 
= ((A XU) N (A x B)) U (A XU) N (4' x B), 
by 9b of Theorem 4.1, 
=(AXB)UGS=AXB, : 
since (A X U) N (A X B) = (A X B) and (A x U) N (4 x B) = f. 
5.7. (a) a = d, b = e, and c = f implies that (a, b, c) = (d, e, f). 
Conversely, (a, b, c) = (d, e, f) implies ((a, b), c) = ((d, e), f), which 
implies (a, b) = (d, e) and c = f, which in turn implies that a = d and 
b — e, proving the assertion. 
(b) If the corresponding objects of the r-tuples are equal, the equality 
follows immediately. We now show that, conversely, 
(as, a», +++, aj) = (bi, ba, +- +, bj) implies a, = bi, a2 = bo, «++, a, = br 
for any integer r > 1. We know the result is true for r = 2 and r = 3 
by part (2). Assume the result is true for the integer k, where k > 1. 
Using the definition of an ordered (k + 1)-tuple, 
(ay, an, +++, Qr akya) = (bi, be, «++, Buy begs) means 
(ai, a», +++; ax), ess) = (bi, bs, +++, bi), braa). 
This equality of ordered pairs implies that aj, = br+ı and 
(ax, az, +++, a4) = (by, bs, +++, by). But by the induction hypothesis, it 
follows that a, = bi, +++, a, = by, which establishes the result for 
r — k -- 1. Hence we have shown that if the statement is true for 
r =k, where k is any integer greater than 1, then it is also true for 
r =k + 1. This, together with the result that it is true for r = 2, 
completes the proof for all integers r > 1. 


Chapter 2 
11 


(a) The set D in (1.3); 
(c) S = (PN, PD, PQ, NP, ND, NQ, DP, DN, DQ, QP, QN, QD}; 
(e) S = ((0,2), (1, 1), (2, 0)}, where (0,2), for example, represents 
the outcome of zero objects in cell 1 and two objects in cell 2; 
(g) S = (FFF, FFM, FMF, FMM, MFF, MFM, MMF, MMM}; 
(i) S = {0, 1,2, -+-,7}, the set of possible numbers of heads, or with 
more detail, 

S= (a, t8) IzieA, i= 1,2, +++, 7}, 
. Where A = {H, T). 
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1.3. 
1.5. 
1.7. 
2.1. 
2.3. 


2.5. 


3.1. 
3.3. 


3.5. 


3.7. 
3.9. 


4. 

All are suitable except (b) and (e). 

S = {1b, lw, 2b, 2w} or S = (1bi, lus, lus, 2b», 20s, 2w}. 

(a) E = (A, Ka +++, 2}; (Q E = {43}. 

(a) Let A = (1,2, ---, 365}. Then 
E-(39XAXAX--XA(r—14A'S, 

F-AX(8) XAX--XAÀ(r—14A'sin al), 

ENPF = {3} X {28} XA X +- XA (r — 2 A’s); 

(b) n(E) = n(F) = 36573, n(E NF) = 365, and n(EU F) = 
129(305)77? (cf. Example I.3.4). 

The relations are readily seen from the following: 

S = {(0, 2), (1, 1), 2,0), E = (0,25, F = (2,0), G = {0, 2). 
(a) 35; (e) ve 

(a) S = {ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADF, 
AEF, BCD, BCE, BOF, BDE, BDF, BEF, CDE, CDF, CEF, DEF}, 
assign probability 3; to each simple event; (05; (0i. 


(a) It S = (HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}, and 
4 is assigned as the probability of each simple event, 

then P(exactly two tails) = 3; 

(c) If S = ((0, 2), (1, 1), (2, 0)} and we assign 1 as the probability of 
cach of the three simple events, then P(one cell empty) — $. 

If we assign probabilities of 1, 3, and 1 to the simple events {(0, 2)}, 
((1, 1}, and {(2, 0)} respectively, then P(one cell empty) = 3. 

The latter assignment is preferred; 

(€) IfS = ((zu a2, 2) | ave A, i = 1,2, +++, 7}, where A = {H, T}, 
and we assign the same probability to each simple event of S, 

then P(all coins fall heads) = (3) 

(g) I.S = (Sun., Mon., Tues., Wed., Thurs., Fri., Sat. and we assign 
to each simple event the probability +, then P(13th day falls on Sun- 
day) = i. Butsee American Mathematical Monthly, vol. 40 (1933), 
p. 607, for a demonstration that the 13th day is more likely to be Fri- 
day than any other day of the week. 


P(E N F) = (439, P(E U F) = 729/(365)*. 
(a) 123, 132, 213, 231, 312, 321; 

(c) P(E) = P(E) = P(E;) = 3, PU U E) = 3, P(E, N E) = 4, 
P(E N E:N E) = 3, P(E, U E: U Bs) = 


2 
3° 
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4.1. 
4.3. 
4.5. 


4.7. 


4.9. 
4.11. 


4.13. 
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P(E) = 4, 11 tol. 

5 to 4. 

js € P(F) € $, the extreme values occurring when E (| F = Ø and 
E N F = E, respectively. 

(a) S = {(z, y) |z e D, y e D, £ Æ y} where D is defined in Example 
1.3., and we assign probability 1/2652 to each simple event of S; 
(b) xw; (c) 25 to 1. 

0.8. 

(a) i; (b)i; (c) If, for any k = 1, 2, 3, ---, an integer p is selected 
at random from among the first 2(10)* positive integers, then the 
probability that p is divisible by either 6 or 8 is 4. 

(a) P(E’ U F’) = 1 — P(E N F), the probability of not both E and P; 
(c) P(E' U F) = 1 — P(E) + P(E N F), the probability of F or 
not E; 

(e) P(E N F’) = P(E) — PE A F), the probability of E but not F. 


- If E, represents selecting a spade, E; an honor card, and 23a deuce, then 


P(E U EU E) = 4$ -- 38 ds — de — de — dod do B 


Lr] 


- The theorem follows immediately from Formula (4.6), Definitions 4.2 


and 3.3, and by noting that if Ey, E», and E; are mutually exclusive 
in pairs, then 


ENEN Es = BN (EN Bs) =EN =p. 


- The theorem is true for k = 2 and k = 3. (Cf. Theorem 4.5 and Prob- 


lem 4.17.) Assuming the theorem to be true for any k events (the in- 
duction hypothesis), we must show that it is true for k + 1 events. 
This, plus the fact that the theorem is true for k = 2, will complete 
the proof for all integers k > 1. Now 
P(B U EU + U Er U Ern) = P(E U E: U «++ U Ej) U Ern). 
But by Theorem 1.4.2. and since Ly, E», +++, Ery are mutually exclusive 
in pairs, it follows that 
(B: U E» U +++ U Ej) N Ern = f. 

Hence, by Theorem 4.5 and the induction hypothesis, 
P(B, U B U ++ U Er U Erg) 

= PE) + P(E) + +++ + P(E) + POR 


establishing the theorem for k+ 1 events and thus completing the 
proof. 
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4.21. 


5.1. 


5.19. 


2 
7 OOO 


. First derive the identity in Problem 5.16(f) and then use P(E A F) 


1 
3-2-1 


(a) 1 


=2 
if 3 


1 
2-1 * 
1 z 1 ji . 5. 
C r-s1*42i1 4e & 
(c) In general, the probability of at least one match is 
1 1 " 
1! N! 


where N! denotes the product of the first N positive integers. 


1 1 
=a tg" * 


events are not assigned equal probabilities.) 


. $49 - 245 = .59, approximately. 


(0i Gh 


(b) @) S = (m «++, 2x) [ave (H, Th i= 1, 2, «+4, N}; assign prob- 
ability (3)* to each simple event of S. 
Gi) 2-1/2" = 4. (iii) 5/8) = + 


a 


1 
Je 


(a) .00359; (b) (1 — .00359)(.00380) = .00379; 
(c) (1 — .00359)(1 — .00380)(.00396) = .00393. 


V 


P(E)P(F), which follows from the given inequality. 
a, b, c, d, and e must satisfy the following equations: 


a € c 2 
a+b 8 ctd+e 6 


- Solving these equations, we find the unique solution 


a+b+ct+d+e=l], atb=> 
———' 
c+d+e 5 
a=%, b=%, c=} doi» and ¢ = §. 

9 


(a) Plan 1: P(E) = he Tj yj 9 
Plan 2: P(E) = 1 — ea? 


Pl s: pa = (Sic s (9.2) Ea x \/9-— a 
"e PA) 100 a ) t? 1:9 Aw 98 / 
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6.1. 


6.3. 
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(a) Define S = ((z, y) |z €C, y eC, and z = y), where " 
C = (Bs, B», Bs, Bs, Gi, G2}, the set of six children. Assign probability 
i to each simple event and note that there are ten elements in the 
subset E for which the second child is a girl. Thus P(E) = 

(b) P(E) = (0) + (D = 5. 

The probabilities needed are: P(E) = 0.254, P(E’) = 0.746, 

P(E|E) = 442, PURE) = s, POE) = $$, P(Es\E) = z4r, 
P(E,\|E’) = 348, P(E.|E’) = 443, P(ESE) = 235, PRAE) = = Ht 


6.5. 3. 

6.7. $1 = .90, approximately. 

6.9. 33. 

6.11. 4. 

6.13. E N E; C E, i = 1,2, ---, n, by Definition I.3.1., demonstrating con- 


7A. 
7.3. 


7.1. 
7.9. 


dition (i) of Definition 6.1. Also, use Theorem I.4.1. to show that for 
i#j, (E N E) N (EN Ej) — follows from the hypothesis that 
E; N E; = Ø. Thus condition (ii) holds. Finally, since (Es, «++, Ba} is 
a partition of S, if z is any element of E C S, then there exists some 
E; such that z € Ej. Then z e(E N Ej), which demonstrates condi- 
tion (iii). 

Independent events in (a) and (b), dependent in (c). 

(a) P(E)P(P) = Gs) # P(E N F) = 4; (b) Ifn = 3, the events 


4r 
are independent. (Cf. Example 7.3.) To prove the “only if" part, we 
note that if 5 = {(x, +++, £n) | z: e (H, T), i= 1,2, ---, n) and we 


assign probability (1)" to each a event of S, then 
P(E) =” i l P= 


Qn 


=2 al PDF a 
If E and F are so ird then 


nT1M27—2Y mn 
2» 2 Jow 


which implies that n + 1 = 2" orn = 3. 


- (a) Let S be the set of 7460 females in the sample. 1/7460; 


(c) 0.143; (e) 0.014, approximately. 
All independent. 
Since P(F) = 1 and, by Theorem 4.2, P(E U F) > P(P), it follows 
that P(E U F) = 1. Now use Theorem 4.4 to show that 
P(E)P(F) = PŒ AN F). 
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7.11. 


7.13. 
8.1. 


8.3. 


8.5. 
8.7. 


8.9. 


8.11. 
9.1. 


9.3. 


No. For counterexample, choose any event F with P(F) = 1 and let 
E and G be any dependent events. (Cf. Problem 7.9.) 
(a) P(S)P(A) = 0G) # P(S A A) = 7s; (b) 9. 
P(E)P(E) = Q)G) = P(E N E2, 
P(E)P(E) = (0G) = P(E: N E, 
P(E2)P(Es) = (8)(3) = P(E: N E), 
but P(E,)P(E2)P(Es) = +s # P(E: N E:N Es) = 0. 
P(E; Y E) Q E) = P(E C) E: N Es) = P(E) P(E:)P(Bs), since we 
know Equation (8.3) holds. But then P(Z:)P(#:) = P(E, N E:), and 
thus P(E N Ex) N Ex) = P(E: N E2)P(Es). That E, and E; are not 
necessarily independent may be seen by letting E» = Ø, and E, and Es 
be any dependent events. 
Twice, with probability .46. 
P(E, N E:) = P(E;)P(E2) by hypothesis. Consider the case P(E;) = 0. 
Then since 0 € P(E: N E) < P(Es) = 0, it follows that 
0 = P(E, N Es) = P(E,)P(Es). 

By an identical argument, P(E: N Ex) = P(E:)P(E;). Similarly, 

P(E; N E: N Es) = P(E)P(E:)P(E;) = 0. 
In the case P(E) = 1, since P(Es) < P(E U Ej, it follows that 
P(E, U Es) = 1. Then, by Theorem 4.4, P(E, Q E3) = P(E )P(E;). 
By the same argument P(E: N £3) = P(E:)P(E:). Also since P(E) = 
1, it follows that P(E, U E: U Es) = 1, and then, by using the result 
of Problem 4.14, 

P(E; N E: Q E:) = P(E )P(E)P(E;). 
z : T One needs to prove that if E, Es ---, En are independent 
events, then E/, EZ, +++, E; are also independent. 
012. 
(a) Sample space is S X S X S, where S = (H, T}, and $ is the prob- 
ability assigned to each simple event of S X S X 5; 
(b) Using the same sample space as in (a), we assign the probability 
D*q*-* to each simple event whose 3-tuple contains k H's and therefore 
3—Ek'T's. 
The sample space is S X S X «++ X S (ten S's) where 
8 = (correct, incorrect}. (2)*(5)"-* is the probability assigned to each 
Simple event whose 10-tuple contains exactly k “corrects.” 
P(9 or 10 correct) = 51(3) = .00000084, approximately. 
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9.5. .784. 
9.7. 7. 


9.9. (a) S, X Sa = (b Band D; (d) B, D, A, and Y. 


"EN NEC 

(n — 1)’ 

10.1. (a) u; = us = gy, 20. = 2» = 4, Wr = We = 4; 
(c) w = us = 1, 20, = w: = w, = w = 0. 


10.3. (a) i = Py, 2n = ye, Wi =, Ue AS We = HBO, we = FEL 
fo = 4, fi = $4, fo = 16,645/22,396. 
10.5. Substituting f, = 1/g, in (10.12), we have 


1 1— 1 EN 
Grit LI gal 
and taking reciprocals we have (10.14). 


Chapter 3 
1.1. (a) 56; (c) 126; (e) 1,260. 
1.8. (a) .18; (c) .16. 


«6 (3) (2) - 


1.7. 432,510. 
-o (7); @ (3) e (5); 
(d) by solving 6 (2) = (1) +3 H T (2) find n = 6. 


515! 
141. (a) e! z as (b) Same as (a). 


1.13. (a) à; (b) H. 
1.15. (a) .251; (b) .215; (c) .633. 


1.17. 8 PESE = .00033, approximatel: 
2,3,3/)\6)\6/\6) 7 "9 appro 9s 


1.19. .19, approximately. 


1. 


a 


PE 


e 


1.21. po = .16, pı = .31, p» = .29, ps = .16, p, = .06, ps = .00, where pn is 
the probability to two decimal places that sample contains exactly n 
defectives. 
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JO a oy Ce) 2, LG 


' 4 
H 165' 9 ) 2r "( 7 35 
1,4, 4,2 4,3,2 42,1 


1.25. We give the number of different poker hands of each kind. The required 
probability is this number divided by 2,598,960, the total number of 
poker hands. (a) 1,302,540; (b) 123,552; (ce) 54,912; (d) 10,200; 
(c) 5108; (f) 3744; (g) 624; (h) 40. 


1.23. (a) 


1.27. (a) S = Sı X Sı X Sı X Sı, but the probabilities of simple events of 
S are not assigned according to the product rule. In general, knowledge 
of any hand changes the probability of any of the other hands having a 
certain ye 


(b) COGSX e Xi) - (C) this latter ratio being the 
EDD ^ (3 


answer to (c); (d) Refer to Formula IL.9.8, using P(E)) = P(C). 


13 H 13\/13 
139. (a) BAAAB AT 


(5) 


— .13, approximately; 


— .21, approximately ; 


4! /13 rj 3 
(c) ži t S e .11, approximately. 
(is) 
From the preceding problem, the probability that the queen falls is 


407 + (4)(.497) = .531. 
Hence the odds are approximately 53 to 47, or 1.13 to 1. 


1.31. 


21. (a) p> + 5p*g + 10p%q? + 10p'g + 5pq' + di; 


(c) at — 120%) + 54a?b* — 108ab + S1b*. 


2.3. (a) 1.072; (b) 1.219; (c) .922. 
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2.5. 


2.7. 


2.9. 


2.11. 


11. 


1.3. 


1.5. 


1.7. 


1.9. 
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rn! rn(n — 1)! n—1Y. 
(a) 7(") rn—r! r(r—1(n—r!- 7 (; — ji 


-1 —1 (n — 1)! (n — 1)! 
(e) ("= ee T ) ^ (r- Din -9i * rn -r— Di 
Tín- 1l)! + (n-r)(n-1)!_ (”): 


rY(n — r)! T 


n ) n! n!(n — r) 
C )-en—— (r + Drí(n—r(n—-r-—1! 
«zn 
Tcl r 


(a) (7) = ex = He Di —2) --- (s — r-E1) gt z is an integer 


r! 
such that 0 € z < r, then a term of the numerator above is zero. Hence 


(2 = 0, as defined in Equation (2.10). If z >r and an integer, 
f 


à — r)! 
then by multiplying the above expression by ER we have 


i = mw as previously defined. | 


(a) 1; (b) 252; (c) 12,600. 


Chapter 4 
(a) f(z) = 4,4, $, & for z = 0,1,2,3, respectively, f(z) — 0 otherwise; 
(b) F(z) = Oif z <0, F(z) = 1 if0 Ez«lF()-i£iflzz«c2 
Mz) = Fif2 <2 <3, P(e) =1ife >3, 
(a) f(z) = 45, $8, H for x = 0, 1, 2, respectively, f(x) = 0 otherwise; 
(b) F(z) = 0ifz <0, F@) =38if0 <2< 1, F(z) = $8if1 <2 <2, 
F(z) = life > 2, 
(@) B= 4; (D 41,3; (© 2 (d Fe) =0if 2 <0, Fe) = 4 if 
0<2<1, F(z) =F if1 S2<2, F(x) 2 lifz 2. 
b) 4, 4, Ps, 4, 3, 3, 1, 3; 
(c) f(z) = 4, 4, 5 3 for z = —1, 1, 2, 3, respectively, f(z) = 0 other- 
wise. 


f(a) = Cs ) forz = 0, 1, .. 
(5) 


decimal place accuracy) 


+, 13; one finds (with three 
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24. 


2.3. 
2.5. 
2.7. 


2.9. 
2.11. 
2.13. 
2.15. 


2.17. 
2.19. 


f(z) = .013, .080, .206, .286, .239, .125, .042, .009, .001 
for z — 0, 1, 2, 8, 4, 5, 6, 7, 8 respectively, and f(x) = 0 for all other x. 


. (a) The event (X < b) is the union of the two mutually exclusive 


events (X <a) and (a < X <b). Hence F(b) = F(a) + Pla < X € b) 
from which the result follows; 

(c) The event (a € X < b) is the union of the mutually exclusive 
events (a < X < b) and (X = b). Hence, using result in (b), 


F(b) — F(a) + f(a) = Pla < X <b) + f(b). 

(a) X; and X» are not equal, for they have different domains; but their 
probability functions are both given by f where f(1) = f(2) = à, 
f(z) = Oif x # 1 or 2; 

(b) Let S = {0,, 02, +++, Ony be any set with at least two elements. 
Make an acceptable assignment of probabilities to the simple events 
of S so that some one simple event, say {0}, has probability 3. Define 
X by X(o) = 1, X(0) = 2if j # 1. Then X has the probability func- 
tion f defined in (a). We get a different random variable X with each 
choice of S and there are infinitely many sets from which to choose S. 


(a) 3; 

(b) EY) = (—1.5)(8) + C79) + C99) + (1.5) (8) = 0; 
(c) E(Z) = 3. 

E(X) = —.05 of a dollar. 

1. 


(a) Mean net profit (in cents) is —80, 50, 100, 90, when stock is 0, 1, 
2, 3 flowers, respectively; (b) At least $1.00. 

b = 1.04, E(Y) = 0. 

$20. 

E(X?) = 932, LE(X)T = 49. 

(a) f(—e) = 48, © = ss fe) = 265 f(Be) = sis 

(b) E(X) = —17e/216 or —.08, approximately, when e = 1. 


E(X) = 3. 
(a) P(X, = k) = 4g fork = 0, 1, +++, 95 
P(X: = k) = dy fork = —3, —2, ---, 3, 4, 
P(X: = k) = dj fork = 5,6, P(X: = k) = yy fork = 7, 8, 9; 


(b) E(X;) = $4.50, E(X:) = $4.25, accept option 1. 
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2.21. (a) Unique if p ¥ F'(z;) for k = 1, 2, -- 7, N. If there is a possible 


value x; of X for which F(z;) = p, then there are infinitely many 
medians. 


2.23. From the hypothesis it follows that if a + d; is a possible value of X 
for any number d;, then a — d; is also a possible value and 
fla + dj) = f(a — dj). 


Suppose there are p such pairs. Then 
EX) = af) + È (a+ dfa +d) + È (a — à)f(a — à) 
j= j= 


p p 
7 [fo + 3 fata) + È fa- a)]- a, 
j=1 PEST 


Since the sum in brackets is the sum of f(xx) over all possible values 7% 
of X, and hence equals 1. 


3.1. ux = 3, ok = 8, ox = 0.43, approximately; 

Ly = 7000, o} = 4,500,000, cy = 2121, approximately; 
Mz = $, 0% = 34, oz = 1.09, approximately. 

$3. of = 3& ox = 2.41, approximately, 

3.5. (a) Var(X;) = k/4; (b) Var(X;) = kp(1 — p). 
3.7. (a) 2504; (b) 16; (c) 4; (d) 4; (e) 2. 


3.9. Use Theorem 2.1 to find 


N 
EQ) = Z (a+ bre + Dfa) = a + bE) + cE(X2). 


Now use (3.10). 


- (a) If f, g, h are probability functions of X for methods (1), (2), and 
(3), respectively, then f(1) = 1; g(0) = fy o0) = 43, g(2) e 
9(3) = 2r; AO) = 3, AC) = 2, AG) = 1; : 

(b) E(X) = 1 for each method; 
(c) Standard deviations of X for the 


: methods are 0, V6/3, and 1 
respectively. 


3.13. Mean absolute deviations are -8125, 2.3125, .8125, and 1.625 for Xi, 


Xs», Xs, and Xi, respectively. 


3.15. (a) —1.5,0,.3; (b) go, 90, 96, 113. 
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3.17. In each case, Chebyshev's inequality says probability is greater than § 
for z = 1.5 and greater than 1 for z = 2. The actual probabilities are, 
for z = 1.5 and z = 2 respectively: (a) 1,1; (b) $, 15; (c) & 1. 


3.19. z = V2, V10, V20, 10. 


3/8 3/8 
0 1/8 


4.3. 


P(Y = y) 


X and Y are dependent. 


(Gs 5 -.) 
4.5. h(a, y) CPAN v ZZY if x and y are any nonnegative 


13 
integers for which x + y € 13, h(x, y) = 0 otherwise. X and Y are 
dependent since h(13, 13) = 0, but 


so that h(13, 13) = f(13)g(13). 


4.7, Independence follows from Theorem 4.2 by considering the four tosses 
as two independent trials of two tosses each. 


413. 


4.15. 


5.1. 


5.3. 


5.5. 
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4 5 


1/25 1/25 
1/20 1/20 
1/15 1/15 
1/10 1/10 
0 0 1/5 


12/300 27/300 47/300 77/300 137/300 


(c) P(Y = y|X = 8) = à for 
(d) P(X = z|Y = 3) = 12, 15, 22 for z = 1, 2, 3, respectively; 
(e) Ts, 368- 


“a 
i] 
Se 
m 
a 


ifz<Oorifz<1 

if0 <2t<land1 <z<3 
ifücz«land3 <z 
if1 <zand1<z<3 
ifl <azand3 <z 


Using notation in Table 29, let ci; = f(zj)/f(z;) and show that if X 
and Y are independent, then for k = 1,2, -. 


P(X <2,2<2= 


Lr ae ue ale c 


-, N we have 

hi, Ye) = cijh(zi, yr). 
(c) Let X have exactly two possible values differing only in sign, say 
+1 and —1. Let Y be any random variable such that X and Y are 
dependent. Since X? has only one possible value, X? and Y? are in- 
dependent. 
(a) Let f(z) = P(X + Y = x). Then f(x) = .1, .2, .3, 4 for x = 2, 3, 
4, 5, respectively, and f(x) = 0 otherwise; E(X + Y) = 4; 
(b) Let g(z) = P(XY = z). Then g(x) = .1, 2, 1, 2, 4 for z = 1, 2, 
3, 4, 6, respectively, and g(x) = 0 otherwise; H(XY) = 4. 
(a) Not true for all X, Y. False for random variables in Problem 5.2(a)- 
True if Y = X, for example; 


(c) False for random variables in Problem 5.2(c). True if Y = X; 
(e) True for all X, Y. 


(a) S0 and 13; (b) 20 and 13; (c) 210 and V 1396. 
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5.7. 


5.9. 


(a) ¥(ox) = b for all ox € S; i.e., Y is a constant function. Note that 
our notation does not distinguish a constant function from the number 
that is its constant value; 
(b) Y is the function equal to a for allo; € S. X and Y are independent 
by the result in Problem 4. 10. 
First generalize Theorem 5.1 to functions of n random variables, and 
thus show that 

E(XiXa-:- Xn) S Erwee Val, Va, ***, Un) 
where A(u v2, +17) n) = P(X: = t X: = v2, +++, Xn = Vn) and the 
sum extends over all possible values vı of Xs, v2 of Xs, +*+, Va of Xn 
But by Definition 4.4, 

h( v2 7*7. Un) = filrs)fo(v2) +++ Fan) 

where f, is the probability function of X,. Hence (as in Theorem 5.4 
where n = 2), the sum can be written as a product of the n sums 
> vefe(vx) for k = 1, 2, +++, n. Since the kth sum extends over all pos- 
sible values v of Xi, it is equal to E(X;) and the result follows. (Note: 
A proof by mathematical induction is also possible. In such a proof 
one needs to use the following fact: If Xy, Xs, ***; X, are independent 
and if Y = X,Xs--* Xa, then Y and X, are independent. This result 
can be proved by a method similar to that used below in the solution 
to part (a) of Problem 5.11.) 


. (a) Let z and y be any numbers. Then 


P(Y, = y, Xen = z) = Z PQG = vy o Xe = 9 Xin = x), 
the summation extending over all values v of X;, ++, v of X, such that 
ayy + oss + ar = y. 

Now we invoke Definition 4.4 to obtain 
P(Y; = y, Xen = 2) = È P(X: = v) «++ P(X: = v)P(Xsa = 2). 
Since z is a constant with respect to this summation, the term 
P(X = x) can be placed before the summation sign. The remaining 
sum is just P(Y} = y). Hence, 

P(Y, = y, Xen = 2) = P(Ys = y)PQXa = 2), 
which proves the independence of Y; and Xia. 
(b) Let I be the set of positive integers for which the theorem is true. 
By Theorem 3.3 and (5.10), we know that 1 eT and 2e T. Now let us 
assume that k eI and show that then (k + 1) eI for any integer k. 
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5.15. 


6.1. 
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When there are k + 1 independent random variables X,, -- +; Aimo 
then by part (a), Y; and Xn are independent. Hence, by (5.10), 
Var(Y; + den Xess) = Var(Y;) + a244 Var(Xsa). 
Now use the induction hypothesis (that k e I) to expand Var(Y,) and 
thus show that (k + 1) e I. This completes the proof. 


. (a) ux = lex = 1/V2; (b) The probability function of X for sam- 


ples of size 2 is given by: 


0 1/2 1 3/2 2 


1/16 4/16 6/16 4/16 1/16 


ux = lando? = 1; (c) The probability function of X for samples 
of size 3 is given by: 


4/3 5/3 


1/64 | 6/64 | 15/64 20/64 | 15/64 | 6/64 


(a) ux = $5450, o% = 3,322,500, c. = $1823, approximately; 
(b) The probability function of X is given by: 


z 


PX-zi. : j i 32 | 08 | 04 | .04 | .01 


Ux = $5450, o% = 1,661,250, cg = $1289, approximately. 
Write u; for E(X;) and use the definition of variance together with 
(5.4) to obtain 

Var(Xi + +++ + Xn) = E(Q — m) + --- + (Xn — m) P). 


Now perform the indicated squaring of the sum in brackets and use 
Definitions 3.1 and 6.1 to complete the proof. 
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6.3. (a) Letting f(z) = P(X = 7) we have 
fG) = 1,4, .2,.1,.2 
for T = 259, 319, 339, 319, 399, and f(T) = 0 otherwise; 
E(X) = 108, Var(X) = +88; 

(c) f(x) = 1 for z = 108, f(z) = 0 otherwise; 

E(X) = 108, Var(X) = 0. 
6.5. cg = $74.63, approximately, and the required interval, extending three 

standard deviations on either side of the mean, is $4776 to $5224. 


6.7. Set X ies = 2 dí iex (f= d and solve for nı. 


m N- 


6.9. Show that Var(X = Y) = oX + o3 = 2p(X, Y)oxoar, from which the 
results are immediate. 


6.11. (a) 
0 1 
(X,Y) =0 
3/8 3/8 ü ‘ 
= X and Y independent. 
1/2 1/2 
(b) 
y 
x 0 1 P(X = 2) 
0 17/24 1/24 3/4 p(X, Y) = 2/V15 = 52 
1 1/8 1/8 1/4 X and Y dependent. 
P(Y =y) 5/6 1/6 T 


6.13. p(X, Y) = 0. (Note that X and Y are dependent but uncorrelated.) 
6.15. 


1/16 2/16 1/16 


0 
1 0 2/16 4/16 
2 0 0 1/16 


P(Y =y) | 1/16 4/16 6/16 


p(X, Y) = V2/2 = .71, approximately. 
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—] ifm<0 
6.17. m-f 0 ifm=0 
1 ifm>0. 


6.19. Without loss of generality (see Problem 6.18), we can assume that X 
and Y each have possible values 0 and 1. Then 


E(X) = P(X =1), E(Y)=P(Y = 1), E(XY) =P(X=1,Y = 1) 
so that Cov(X, Y) = 0 implies 

RX = 1,.¥ =) = PX 1)P(Y = 1). 
Show then that the other three joint probabilities must also be products 
of the corresponding marginal probabilities. 


Chapter 5 
1.1. Probabilities are .107 for (a) and .069 for (b). 
1.3. .000006. 


1.5. Using Theorem 1.2, find p = .60 and n = 20. Required probabilities 
are (a) .998 (b) .126 (c) .245. 


1.7. (a) .772; (b) .746; (c) p = .370. 
1.9. n > 69. 


1.11. Corresponding to the values of p given in Table 37, the probabilities of 
accepting the lot are .983, .736, .392, .069, .008, .001, and .000 in 
part (a), and .904, .599, .349, .107, .028, .006, and .001 in part (c). 


143. (a) By (1.3), b(n — kin, 1 — p) = m" i Je — p)” tpe», Now 


recall that 
( n )- nY. 
n—k k}’ 


(b) By ON b(k|n, p) = à b(n — k|n, 1 — p) 


= E, Mh, 1 — p. 


Normal Approx. 
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147. 


2.1. 


2.3. 


2.b. 


G(t) — 5 (eron, which equals (g + pt)", by the binomial 
k=0 \ F 


theorem. 

(a) Accept null hypothesis if X > 0; i.e., accept no matter what result 
is obtained from the sample of 20. The probability of an error of the 
second kind is then 1. 

(b) Reject the null hypothesis no matter what value of X occurs. 
Then « = 1. 

(c) Reject null hypothesis if and only if X > 16. From Table 40 with 
c = 16, find 1 — 7(.70) = .762, 1 — «(.80) = .370, m(.50) = .006. 

(d) .126, accept null hypothesis and wage very expensive campaign. 
(e) P(X > 17) = .016 and P(X > 18) = .004; therefore at least 18 
must favor Smith. 

(a) Null hypothesis: p = 3; Alternate hypothesis: p > i. From Table 


37, find 
P(X > 14) = .058, P(X 215 = 021. 


Hence reject null hypothesis if and only if X, the number who revert 
to first-learned method, is at least 15. 

(b) Significant at the 5% level, since P(X > 15) = 021 < .05. Not 
significant at the 1% level. 

(a) Let a trial (observing à machine for a day) result in success (ma- 
chine needs repair) or failure (machine does not need repair). Let 
p = probability of a success. Null hypothesis: p = .20; Alternate hy- 
pothesis: p # .20. 

(b) Since mean number of successes is np = 4 if the null hypothesis is 
true, we reject null hypothesis if X, number of successes observed, is 
either too much larger or too much smaller than four; i.e., we reject 
null hypothesis if and only if either X € 4 — dor X > 4+4, where d 
denotes the smallest deviation from the mean that makes X “too much" 
larger or “too much” smaller than the mean. The number d is deter- 
mined by requiring the probability of an error of the first kind to be 
no larger than .10 but as close to .10 as possible. This error probability 
is P(X € 4 — d) + P(X z 4 + d), calculated for p = .20. If d = 3, 
this probability is greater than .10; if d = 4, it is less than .10 (from 
Table 37). Hence, reject null hypothesis if and only if X = 0 or X28 
(This is called a two-tailed test.) 

(c) Probability that X deviates from its mean in either direction by 
at least as much as the observed value does is P(X < 1) + P(X > 7) 
or .069 + .087 = .156. Not significant at .10 level. 

(d) Ideal decision rule has z(p) = 0 if p = .20, m(p) = 1 if p = .20. 


INDEX 


ABSOLUTE value, 191 
Acceptable assignment of probabilities, 
55 
for repeated trials, 114, 120 
Algebra of sets, 28-38 
Alternate hypothesis, 273 
A posteriori probability, 94 
A priori probability, 94 
Associative laws (for sets), 29 
Average (see Mean) 
Average covariance, 247 
Average fraction defective, 254, 286 


Bayes’ formula, 93 
Bayes, Thomas, 91 
Bernoulli, James, 228, 253 
Bernoulli process, 254 
for production run, 286 
by sampling with replacement, 264 
Bernoulli trials, 253 
Best decision rule, 291 
Binomial coefficients, 150 
generalized, 157 
identities for, 156, 157 
Pascal’s triangle, 151 
properties of, 151 ff 
recursion formula, 156 
Binomial distribution, 256 
fitted to frequency distribution, 266 
generating function, 270 
as limit of hypergeometric, 271 
maximum value, 269 
mean, 265 
recursion formula, 269 
standard deviation, 265 
standardized, 267 
table, 259-60 
variance, 265 
Binomial parameter p, maximum-likeli- 
hood estimate, 272 


Binomial probabilities, approximation 
for hypergeometric, 271 

Binomial probability function (see 
Binomial distribution) 

Binomial theorem, 149, 257 

Birthday example, 48, 52, 59, 88 

Blocking coalition, 27 

Blood type, 22 

Boole, George, 34 

Boolean algebra, 34 

Brace notation (for sets), 3 

Bridge, 49, 139, 148, 210, 268 


CANTOR, Georg, 1 
Card guessing, 196, 219, 247 
Cards (see Bridge; Card guessing; 
Matching of cards; Poker) 
Cartesian product, 40 
graph of, 41 
number of elements, 43 
Center of gravity, 179 
Certain event, 52 
probability of, 65 
Changes in scale and location of origin, 
effect of: 
on correlation coefficient, 249 
on mean, 178 
on variance, 191 
Characteristic random variable, 171 
Chebyshev, P. L., 192 
Chebyshev’s inequality, 194 
applied to binomial, 267 
generalized, 197 
in proof of law of large numbers, 
226 
Chuck-a-luck, 182 
Coalitions, 27 
Commutative laws (for sets), 29 
Complementary event, 52 
probability, 67 
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Complement (of a set), 17 
laws for, 28 
Composite function, 175 
Compound experiment, 78 
Compound probabilities, theorem on, 
78 


Conditional mean, 249 
Conditional probability, 76 
Conditional probability function, 206 
Correlation coefficient, 241 
effect of changes in scale and location 
of origin, 249 
properties of, 242-246 
Correlated random variables, 241 
Cost of sample inspection, 287 
Counting, fundamental principle of, 9 
Counting techniques, 132-148 
fundamental principle, 9 
objects in cells, 133, 135 
ordered r-tuples, 132, 135 
permutations, 132, 135, 139 
in probability problems, 139-48 
r-subsets, 133, 135 
Covariance, 232 
average, 247 
Critical set, 274, 280 
Cross-partition, 101 
Cumulative binomial probabilities, ta- 
ble, 259-60 
Cumulative distribution function, 171 
(see also Distribution function) 


DECISION-MAKING example, 286 
Decision rule: 
ideal, 276 
(n, c) type, 290 
operating-characteristic curve, 90, 
269 
power function, 275 
in sampling inspection, 90, 269 
to test null hypothesis, 274 
De Morgan's laws, 29 
generalized, 36 
proof by membership table, 31 
verification by Venn diagram, 32 
Dependent events, 102 
Dependent random variables, 204 
Descartes, René, 41 
Dice, E^ 57, 58, 160-65, 173, 190, 
Dictator, 27 
Difference equation, 128 
Disjoint sets, 20 


, 


INDEX 


Dispersion, 185 
Distribution function, 163 
graph, 164 
joint, 211 
properties, 166 
Distribution table, 164 
Distributive laws (for sets), 29 
generalized, 36 
proof by membership table, 33 
verification by Venn diagram, 33 
Domain (of a function), 158 
Dominant gene, 124 
Duality principle, 34 


ELEMENT (of a set), 2 
Empty set, 5, 6 
as impossible event, 52 
probability of, 58 
Equally likely outcomes, 69 
Equivalence relation, 7 
Error, of estimate, 251 
of first kind, 274 
of second kind, 274 
Estimation, statistical, 181 
of binomial p, 265 
in hypergeometric, 271 
maximum-likelihood method, 271 
Euclid, 3 
Eugenics example, 129 n 
Events, characteristic random variable 
of, 171 
dependent, 102 
determined by a trial, 117, 120 
glossary of terms, 52 
as hypotheses, 94 
independent, 102 
from independent trials, 118, 120 
mutually exclusive in pairs, 73 
pairwise independence, 107 
probability of, 58 
probability of union of, 67 
simple, 55 
as subsets, 51 ff. 
Expected value, 173 2 
Experiments, mathematical descrip- 
tion, 46 
compound, 78 
repeated, 113, 120 


FACTORIAL (n!), 184 

logarithms (table), 141 
Failure, as result of trial, 253 
Fair coins, dice, 61 


~ INDEX 


“Favorable” outcomes, 69 
Finite sample space, restriction to, 48 
Finite set, 2 
Fitted binomial distribution, 266 
Flags, numbers on, 64 
Frequency distribution, 221, 234, 266 
Function, definition, 158 
of random variable, 175, 213 
regression, 249 
Function-machine, 159, 175, 213 
Fundamental principle of counting, 9 
used to prove counting theorems, 
135 fi. 


GENERATING function, 270 
Geneties example, 123-181 
Greatest integer symbol, 69 
Guessing of cards, 196, 219, 247 


Hanpr-W EiNBERG law, 128 

Hypergeometric distribution, 270-72 

Hypotheses (see Testing statistical hy- 
potheses) 


Ingar decision rule, 276 
Idempotent laws (for sets), 28 , 
Identically distributed random varia- 
bles, 168 
in sampling with replacement, 222 
in sampling without replacement, 234 
Identity laws (for sets), 28 
Impossible event, 52 
Independent events, 102, 109, 111 
from independent trials, 118 
multiplication rule for, 102, 109 
pairwise, 107 
Independent partitions, 107 
Independent random variables, 204, 
209 
correlation coefficient, 242 
in sampling with replacement, 222 
in sampling without replacement, 234 
Independent trials, 114, 120 
in Bernoulli process, 253 
product rule, 115, 120 
Infinite set, 2, 3, 47 
Inspection, sampling, 90, 269, 290 
Intersection of sets, 17, 35 


J OINT distribution function, 211 
Joint probability function, 197 ff. 
definition, 202 
table, 203 
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Larrace definition of probability, 70 
Law of large numbers, 226 
Least-squares criterion, 250 

Limit theorem, 271 

Linear function, 244 

Linear regression function, 250 
Logarithms (tables), 141 

Losing coalition, 27 


ManGINAL probability function, 200, 
204 
Master mechanic, 287 
Matching of cards, 63, 74, 196, 219, 247 
Mathematical expectation, 173 
Mathematical induction 36, 81, 216 
Maximum-likelihood estimation, 271 
Mechanical interpretation, of mean, 
178 
of variance, 195 
Mean, of binomial, 265 
conditional, 249 
of function of random variable, 176, 
215 
of hypergeometric, 271 
mechanical interpretation, 178 
of product of random variables, 217 
of random variable, 172-84 
of sample mean, with replacement, 
225 
without replacement, 237 
of sample variance, 231 
of standardized random variable, 192 
Mean absolute deviation, 196 
Mean cost of uncertainty, 290 
Median, 172, 184 
Membership table, 30 ff. 
Mendelian mating, 125 
Mode, 172, 184 
Moment of inertia, 195 
Mortality table, 88, 105 
Most probable number of successes, 269 
Multinomial coefficient, 154 
Multinomial theorem, 154 
Multiplication rule, 102, 109 
Mutation, 126 
Mutually exclusive events, 52 
in pairs, 73 
probability of union, 67 
Mutually exclusive sets, 20 


(n, c) Decision rule, 290 
Normal probability curve, 267, 270 
Null hypothesis, 273 ff. 
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Null set (see Empty set) 

Number of elements in a set, 2, 20, 43 
Number of r-subsets, 133, 135 
Numbers on flags, 64 


OxBsERVED event, statistical signifi- 
cance, 282 
Odds, 70 
Operating-characteristic curve, 90, 269 
Ordered n-tuple, 42, 132 
Ordered pair, 39 
Outcomes, as elements of sample space, 
46 
equally likely, 69 
“favorable,” 69 


Pairwise independence of events, 
107 
Panmixia, 125, 128 
Parallel-axis theorem, 196 
Parameters, of binomial, 256 
of hypergeometric, 270 
Partition, 91 
cross-partition, 107 
independent, 107 
Paseal’s triangle, 151 
Percentile, 184 
Permutation, 63, 132, 135, 139 
Poker, 10, 143, 147, 148 
Polya urn model, 87, 99 
Population mean, 222, 225 
Population, sampling from, 222, 234 
Population variance, 222, 231 
Possible values of random variable, 166 
Power function, 275 ff. 
Prime number, 3 
Probability, acceptable assignment to 
* simple events, 55, 114, 120 
a posteriori, 94 
a priori, 94 
basic definition, 58 
of complementary events, 67 
conditional, 76 
of empty set, 58 
extreme values, 66 
interpretations of, 228 
odds, 70 
of union, 67 
Probability chart, 162 
Probability function, 161 
binomial, 256 
conditional, 206 
graph, 162 
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Probability function, (cont.): 
hypergeometric, 270 
joint, 202 
marginal, 200, 204 
properties, 165 
Probability table, 161 
Product rule for independent trials, 
115, 120 
Product set (sce Cartesian product) 
Production run, 286 


QUARTILES, 184 


RaNpow mating, 125 

Random process, 254 

Random selection, 61 (see also Sam- 
pling) 

Random variables, binomial, 257 ff. (scc 
also Distribution function; Inde- 
pendent random variables; 
Probability function) 

characteristic, 171 

as composite function, 175 
conditional mean, 249 . 
conditional probability function, 200 
correlation coefficient, 241 
covariance, 232, 247 

defined as function, 159 
dependent, 204 

determined by trials, 209 
equal with probability one, 245 
identically distributed, 168 
mean, 172 

mean absolute deviation, 196 
median, 172 

mode, 172 

possible values, 166 

regression function, 249 
Standard deviation, 187 
standardized, 192, 241, 267 
variance, 187 

Range (of a function), 158 

Rate of mortality, 89 

Recessive gene, 124 

Recursion formula, binomial coeffi- 
cients, 156 

binomial distribution, 269 

Reflexive relation, 7, 8 

Regression function, 249 

Regression under stress, 284 

Relation, equivalence, 7 

Rencontre, problem of, 74 


INDEX 


Repeated experiments, 113, 120 
Roster method, 2 


SAMPLE mean, 223 ff. 
mean of, 225, 237 
probability function, 224, 237 
variance of, 225, 237 
Sample run, 288 
Sample space, 45 ff. 
for Bernoulli trials, 253, 255 
as certain event, 52 
for compound experiment, 80 
infinite, 47 
for repeated trials, 113, 120 
restriction to finite, 48 
Sample variance, 231 
Sampling inspection plan, 90, 269, 290 
Sampling theory, 181 
Sampling, with replacement, 115, 205, 
221 ff., 239, 264 
without replacement, 123, 205, 
234 ff., 270 
Selection force, 126 
Series competition, 261 
Sets, algebra of, 28 (sce also Events; 
Sample spaces; Subsets) 
associative laws, 29 
brace notation, 3 
Cartesian product, 40 
commutative laws, 29 
complement, 17, 28 
defining property, 2 
De Morgan's laws, 29 
disjoint, 20 
distributive laws, 29 
element, 2 
equality, 5 
finite, 2 
graph, 4 
idempotent laws, 28 
identity laws, 28 
intersection, 17 
mutually exclusive, 20 
number of elements, 2, 20, 43 
partition, 91 
roster method, 2 
symmetric difference, 37 
union, 17 
universal, 16 
Significance level, 282 
Simple event, 55 
Smoking habits, 105 
Square root, 191 
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Standard deviation, 187 
of binomial, 265 
of hypergeometric, 271 
of sample mean, 226 
Standard score, 192 
Standardized random variable, 192, 241 
binomial, 267 
Statistical estimation (see Estimation) 
Statistical hypothesis, 274 (see also 
Testing statistical hypotheses) 
Statistical significance, 282 
Step function, 164, 166 
Stochastic process, 254 
Subset, 8 (see also Events) 
number of, 11, 150 
r-subset, 133, 135 
Success, as result of trial, 253 
Sums of random variables, 212 ff. 
mean, 215, 216 
variance, 218, 219, 233 
Symmetric difference, 37 
Symmetric relation, 7, 8 


TABLES, common logarithms, 141 
cumulative binomial probabilities, 
259-60 
joint probabilities, 203 
logarithms of factorials, 141 
Testing statistical hypotheses, 272-285 
critical set, 274, 280 
decision rule, 274 fi. 
errors, 274 
power function, 275 
significance tests, 282 
Transitive relation, 7, 8 
Tree diagram, 9, 78, 83, 84, 94, 96 
Trials, Bernoulli, 253 (see also Inde- 
pendent trials) 
determine an event, 117, 120 
repeated, 113, 120 
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